* [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
@ 2011-09-29 2:00 Liu Bo
2011-09-29 4:25 ` Yan, Zheng
0 siblings, 1 reply; 11+ messages in thread
From: Liu Bo @ 2011-09-29 2:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, chris.mason, lizf, miaox, zheng.z.yan, dave
The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.
But there are two cases to lead to tree corruptions:
1) multi-thread snapshots can commit serveral snapshots in a transaction,
and this may change the src root when processing the following pending
snapshots, which lead to the former snapshots corruptions;
2) the free inode cache was changing the roots when it root the cache,
which lead to corruptions.
This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/ctree.c | 17 ++++++++++++++++-
fs/btrfs/ctree.h | 2 ++
fs/btrfs/transaction.c | 8 ++++++++
3 files changed, 26 insertions(+), 1 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 011cab3..49dad7d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct extent_buffer *buf)
{
+ /* ensure we can see the force_cow */
+ smp_rmb();
+
+ /*
+ * We do not need to cow a block if
+ * 1) this block is not created or changed in this transaction;
+ * 2) this block does not belong to TREE_RELOC tree;
+ * 3) the root is not forced COW.
+ *
+ * What is forced COW:
+ * when we create snapshot during commiting the transaction,
+ * after we've finished coping src root, we must COW the shared
+ * block to ensure the metadata consistency.
+ */
if (btrfs_header_generation(buf) == trans->transid &&
!btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
!(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
- btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
+ btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
+ !root->force_cow)
return 0;
return 1;
}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 03912c5..bece0df 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1225,6 +1225,8 @@ struct btrfs_root {
* for stat. It may be used for more later
*/
dev_t anon_dev;
+
+ int force_cow;
};
struct btrfs_ioctl_defrag_range_args {
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7dc36fa..bf6e2b3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
btrfs_save_ino_cache(root, trans);
+ /* see comments in should_cow_block() */
+ root->force_cow = 0;
+ smp_wmb();
+
if (root->commit_root != root->node) {
mutex_lock(&root->fs_commit_mutex);
switch_commit_root(root);
@@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
btrfs_tree_unlock(old);
free_extent_buffer(old);
+ /* see comments in should_cow_block() */
+ root->force_cow = 1;
+ smp_wmb();
+
btrfs_set_root_node(new_root_item, tmp);
/* record when the snapshot was created in key.offset */
key.offset = trans->transid;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 2:00 [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush Liu Bo
@ 2011-09-29 4:25 ` Yan, Zheng
2011-09-29 6:47 ` Miao Xie
2011-09-29 8:18 ` Liu Bo
0 siblings, 2 replies; 11+ messages in thread
From: Yan, Zheng @ 2011-09-29 4:25 UTC (permalink / raw)
To: Liu Bo; +Cc: linux-btrfs, josef, chris.mason, lizf, miaox, dave
On 09/29/2011 10:00 AM, Liu Bo wrote:
> The btrfs snapshotting code requires that once a root has been
> snapshotted, we don't change it during a commit.
>
> But there are two cases to lead to tree corruptions:
>
> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
> and this may change the src root when processing the following pending
> snapshots, which lead to the former snapshots corruptions;
>
> 2) the free inode cache was changing the roots when it root the cache,
> which lead to corruptions.
>
For the case 2, the free inode cache of newly created snapshot is invalid.
So it's better to avoid modifying snapshotted trees.
> This fixes things by making sure we force COW the block after we create a
> snapshot during commiting a transaction, then any changes to the roots
> will result in COW, and we get all the fs roots and snapshot roots to be
> consistent.
>
> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> fs/btrfs/ctree.c | 17 ++++++++++++++++-
> fs/btrfs/ctree.h | 2 ++
> fs/btrfs/transaction.c | 8 ++++++++
> 3 files changed, 26 insertions(+), 1 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 011cab3..49dad7d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
> struct btrfs_root *root,
> struct extent_buffer *buf)
> {
> + /* ensure we can see the force_cow */
> + smp_rmb();
> +
> + /*
> + * We do not need to cow a block if
> + * 1) this block is not created or changed in this transaction;
> + * 2) this block does not belong to TREE_RELOC tree;
> + * 3) the root is not forced COW.
> + *
> + * What is forced COW:
> + * when we create snapshot during commiting the transaction,
> + * after we've finished coping src root, we must COW the shared
> + * block to ensure the metadata consistency.
> + */
> if (btrfs_header_generation(buf) == trans->transid &&
> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
> + !root->force_cow)
> return 0;
> return 1;
> }
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 03912c5..bece0df 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1225,6 +1225,8 @@ struct btrfs_root {
> * for stat. It may be used for more later
> */
> dev_t anon_dev;
> +
> + int force_cow;
> };
>
> struct btrfs_ioctl_defrag_range_args {
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 7dc36fa..bf6e2b3 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>
> btrfs_save_ino_cache(root, trans);
>
> + /* see comments in should_cow_block() */
> + root->force_cow = 0;
> + smp_wmb();
> +
> if (root->commit_root != root->node) {
> mutex_lock(&root->fs_commit_mutex);
> switch_commit_root(root);
> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
> btrfs_tree_unlock(old);
> free_extent_buffer(old);
>
> + /* see comments in should_cow_block() */
> + root->force_cow = 1;
> + smp_wmb();
> +
> btrfs_set_root_node(new_root_item, tmp);
> /* record when the snapshot was created in key.offset */
> key.offset = trans->transid;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 4:25 ` Yan, Zheng
@ 2011-09-29 6:47 ` Miao Xie
2011-09-29 6:46 ` Yan, Zheng
2011-09-29 8:18 ` Liu Bo
1 sibling, 1 reply; 11+ messages in thread
From: Miao Xie @ 2011-09-29 6:47 UTC (permalink / raw)
To: Yan, Zheng; +Cc: Liu Bo, linux-btrfs, josef, chris.mason, lizf, dave
On thu, 29 Sep 2011 12:25:56 +0800, Yan, Zheng wrote:
> On 09/29/2011 10:00 AM, Liu Bo wrote:
>> The btrfs snapshotting code requires that once a root has been
>> snapshotted, we don't change it during a commit.
>>
>> But there are two cases to lead to tree corruptions:
>>
>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>> and this may change the src root when processing the following pending
>> snapshots, which lead to the former snapshots corruptions;
>>
>> 2) the free inode cache was changing the roots when it root the cache,
>> which lead to corruptions.
>>
> For the case 2, the free inode cache of newly created snapshot is invalid.
> So it's better to avoid modifying snapshotted trees.
I think this feature, that the inode cache is written out after creating snapshot,
was implemented on purpose. Because some i-node IDs are freed after their tree is
committed, and so the newly created snapshot must cache the i-node ID again to
guarantee the inode cache is right, even though we write out the inode cache of
the trees before they are snapshotted. So it is unnecessary to make the inode cache
be written out before creating snapshot.
Li, am I right?
Thanks
Miao
>
>> This fixes things by making sure we force COW the block after we create a
>> snapshot during commiting a transaction, then any changes to the roots
>> will result in COW, and we get all the fs roots and snapshot roots to be
>> consistent.
>>
>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>> fs/btrfs/ctree.h | 2 ++
>> fs/btrfs/transaction.c | 8 ++++++++
>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 011cab3..49dad7d 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>> struct btrfs_root *root,
>> struct extent_buffer *buf)
>> {
>> + /* ensure we can see the force_cow */
>> + smp_rmb();
>> +
>> + /*
>> + * We do not need to cow a block if
>> + * 1) this block is not created or changed in this transaction;
>> + * 2) this block does not belong to TREE_RELOC tree;
>> + * 3) the root is not forced COW.
>> + *
>> + * What is forced COW:
>> + * when we create snapshot during commiting the transaction,
>> + * after we've finished coping src root, we must COW the shared
>> + * block to ensure the metadata consistency.
>> + */
>> if (btrfs_header_generation(buf) == trans->transid &&
>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>> + !root->force_cow)
>> return 0;
>> return 1;
>> }
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 03912c5..bece0df 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>> * for stat. It may be used for more later
>> */
>> dev_t anon_dev;
>> +
>> + int force_cow;
>> };
>>
>> struct btrfs_ioctl_defrag_range_args {
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 7dc36fa..bf6e2b3 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>
>> btrfs_save_ino_cache(root, trans);
>>
>> + /* see comments in should_cow_block() */
>> + root->force_cow = 0;
>> + smp_wmb();
>> +
>> if (root->commit_root != root->node) {
>> mutex_lock(&root->fs_commit_mutex);
>> switch_commit_root(root);
>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> btrfs_tree_unlock(old);
>> free_extent_buffer(old);
>>
>> + /* see comments in should_cow_block() */
>> + root->force_cow = 1;
>> + smp_wmb();
>> +
>> btrfs_set_root_node(new_root_item, tmp);
>> /* record when the snapshot was created in key.offset */
>> key.offset = trans->transid;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 6:47 ` Miao Xie
@ 2011-09-29 6:46 ` Yan, Zheng
2011-09-29 7:19 ` Miao Xie
0 siblings, 1 reply; 11+ messages in thread
From: Yan, Zheng @ 2011-09-29 6:46 UTC (permalink / raw)
To: miaox; +Cc: Liu Bo, linux-btrfs, josef, chris.mason, lizf, dave
On 09/29/2011 02:47 PM, Miao Xie wrote:
> On thu, 29 Sep 2011 12:25:56 +0800, Yan, Zheng wrote:
>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>> The btrfs snapshotting code requires that once a root has been
>>> snapshotted, we don't change it during a commit.
>>>
>>> But there are two cases to lead to tree corruptions:
>>>
>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>> and this may change the src root when processing the following pending
>>> snapshots, which lead to the former snapshots corruptions;
>>>
>>> 2) the free inode cache was changing the roots when it root the cache,
>>> which lead to corruptions.
>>>
>> For the case 2, the free inode cache of newly created snapshot is invalid.
>> So it's better to avoid modifying snapshotted trees.
>
> I think this feature, that the inode cache is written out after creating snapshot,
> was implemented on purpose. Because some i-node IDs are freed after their tree is
> committed, and so the newly created snapshot must cache the i-node ID again to
> guarantee the inode cache is right, even though we write out the inode cache of
> the trees before they are snapshotted. So it is unnecessary to make the inode cache
> be written out before creating snapshot.
>
When opening the newly created snapshot, orphan cleanup will find these
freed-after-commited inodes and update the inode cache. So technically,
rescan is not required.
> Li, am I right?
>
> Thanks
> Miao
>
>>
>>> This fixes things by making sure we force COW the block after we create a
>>> snapshot during commiting a transaction, then any changes to the roots
>>> will result in COW, and we get all the fs roots and snapshot roots to be
>>> consistent.
>>>
>>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>>> fs/btrfs/ctree.h | 2 ++
>>> fs/btrfs/transaction.c | 8 ++++++++
>>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index 011cab3..49dad7d 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>>> struct btrfs_root *root,
>>> struct extent_buffer *buf)
>>> {
>>> + /* ensure we can see the force_cow */
>>> + smp_rmb();
>>> +
>>> + /*
>>> + * We do not need to cow a block if
>>> + * 1) this block is not created or changed in this transaction;
>>> + * 2) this block does not belong to TREE_RELOC tree;
>>> + * 3) the root is not forced COW.
>>> + *
>>> + * What is forced COW:
>>> + * when we create snapshot during commiting the transaction,
>>> + * after we've finished coping src root, we must COW the shared
>>> + * block to ensure the metadata consistency.
>>> + */
>>> if (btrfs_header_generation(buf) == trans->transid &&
>>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>>> + !root->force_cow)
>>> return 0;
>>> return 1;
>>> }
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 03912c5..bece0df 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>>> * for stat. It may be used for more later
>>> */
>>> dev_t anon_dev;
>>> +
>>> + int force_cow;
>>> };
>>>
>>> struct btrfs_ioctl_defrag_range_args {
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 7dc36fa..bf6e2b3 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>>
>>> btrfs_save_ino_cache(root, trans);
>>>
>>> + /* see comments in should_cow_block() */
>>> + root->force_cow = 0;
>>> + smp_wmb();
>>> +
>>> if (root->commit_root != root->node) {
>>> mutex_lock(&root->fs_commit_mutex);
>>> switch_commit_root(root);
>>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>> btrfs_tree_unlock(old);
>>> free_extent_buffer(old);
>>>
>>> + /* see comments in should_cow_block() */
>>> + root->force_cow = 1;
>>> + smp_wmb();
>>> +
>>> btrfs_set_root_node(new_root_item, tmp);
>>> /* record when the snapshot was created in key.offset */
>>> key.offset = trans->transid;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 6:46 ` Yan, Zheng
@ 2011-09-29 7:19 ` Miao Xie
2011-09-29 7:09 ` Yan, Zheng
0 siblings, 1 reply; 11+ messages in thread
From: Miao Xie @ 2011-09-29 7:19 UTC (permalink / raw)
To: Yan, Zheng; +Cc: Liu Bo, linux-btrfs, josef, chris.mason, lizf, dave
On thu, 29 Sep 2011 14:46:20 +0800, Yan, Zheng wrote:
> On 09/29/2011 02:47 PM, Miao Xie wrote:
>> On thu, 29 Sep 2011 12:25:56 +0800, Yan, Zheng wrote:
>>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>>> The btrfs snapshotting code requires that once a root has been
>>>> snapshotted, we don't change it during a commit.
>>>>
>>>> But there are two cases to lead to tree corruptions:
>>>>
>>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>>> and this may change the src root when processing the following pending
>>>> snapshots, which lead to the former snapshots corruptions;
>>>>
>>>> 2) the free inode cache was changing the roots when it root the cache,
>>>> which lead to corruptions.
>>>>
>>> For the case 2, the free inode cache of newly created snapshot is invalid.
>>> So it's better to avoid modifying snapshotted trees.
>>
>> I think this feature, that the inode cache is written out after creating snapshot,
>> was implemented on purpose. Because some i-node IDs are freed after their tree is
>> committed, and so the newly created snapshot must cache the i-node ID again to
>> guarantee the inode cache is right, even though we write out the inode cache of
>> the trees before they are snapshotted. So it is unnecessary to make the inode cache
>> be written out before creating snapshot.
>>
>
> When opening the newly created snapshot, orphan cleanup will find these
> freed-after-commited inodes and update the inode cache. So technically,
> rescan is not required.
Not orphan inode IDs.
The inode IDs in the free_ino_pinned tree are also freed after the fs/file tree commit.
>
>> Li, am I right?
>>
>> Thanks
>> Miao
>>
>>>
>>>> This fixes things by making sure we force COW the block after we create a
>>>> snapshot during commiting a transaction, then any changes to the roots
>>>> will result in COW, and we get all the fs roots and snapshot roots to be
>>>> consistent.
>>>>
>>>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>>> ---
>>>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>>>> fs/btrfs/ctree.h | 2 ++
>>>> fs/btrfs/transaction.c | 8 ++++++++
>>>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>> index 011cab3..49dad7d 100644
>>>> --- a/fs/btrfs/ctree.c
>>>> +++ b/fs/btrfs/ctree.c
>>>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>>>> struct btrfs_root *root,
>>>> struct extent_buffer *buf)
>>>> {
>>>> + /* ensure we can see the force_cow */
>>>> + smp_rmb();
>>>> +
>>>> + /*
>>>> + * We do not need to cow a block if
>>>> + * 1) this block is not created or changed in this transaction;
>>>> + * 2) this block does not belong to TREE_RELOC tree;
>>>> + * 3) the root is not forced COW.
>>>> + *
>>>> + * What is forced COW:
>>>> + * when we create snapshot during commiting the transaction,
>>>> + * after we've finished coping src root, we must COW the shared
>>>> + * block to ensure the metadata consistency.
>>>> + */
>>>> if (btrfs_header_generation(buf) == trans->transid &&
>>>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>>>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>>>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>>>> + !root->force_cow)
>>>> return 0;
>>>> return 1;
>>>> }
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index 03912c5..bece0df 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>>>> * for stat. It may be used for more later
>>>> */
>>>> dev_t anon_dev;
>>>> +
>>>> + int force_cow;
>>>> };
>>>>
>>>> struct btrfs_ioctl_defrag_range_args {
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 7dc36fa..bf6e2b3 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>>>
>>>> btrfs_save_ino_cache(root, trans);
>>>>
>>>> + /* see comments in should_cow_block() */
>>>> + root->force_cow = 0;
>>>> + smp_wmb();
>>>> +
>>>> if (root->commit_root != root->node) {
>>>> mutex_lock(&root->fs_commit_mutex);
>>>> switch_commit_root(root);
>>>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>>> btrfs_tree_unlock(old);
>>>> free_extent_buffer(old);
>>>>
>>>> + /* see comments in should_cow_block() */
>>>> + root->force_cow = 1;
>>>> + smp_wmb();
>>>> +
>>>> btrfs_set_root_node(new_root_item, tmp);
>>>> /* record when the snapshot was created in key.offset */
>>>> key.offset = trans->transid;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 7:19 ` Miao Xie
@ 2011-09-29 7:09 ` Yan, Zheng
0 siblings, 0 replies; 11+ messages in thread
From: Yan, Zheng @ 2011-09-29 7:09 UTC (permalink / raw)
To: miaox; +Cc: Liu Bo, linux-btrfs, josef, chris.mason, lizf, dave
On 09/29/2011 03:19 PM, Miao Xie wrote:
> On thu, 29 Sep 2011 14:46:20 +0800, Yan, Zheng wrote:
>> On 09/29/2011 02:47 PM, Miao Xie wrote:
>>> On thu, 29 Sep 2011 12:25:56 +0800, Yan, Zheng wrote:
>>>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>>>> The btrfs snapshotting code requires that once a root has been
>>>>> snapshotted, we don't change it during a commit.
>>>>>
>>>>> But there are two cases to lead to tree corruptions:
>>>>>
>>>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>>>> and this may change the src root when processing the following pending
>>>>> snapshots, which lead to the former snapshots corruptions;
>>>>>
>>>>> 2) the free inode cache was changing the roots when it root the cache,
>>>>> which lead to corruptions.
>>>>>
>>>> For the case 2, the free inode cache of newly created snapshot is invalid.
>>>> So it's better to avoid modifying snapshotted trees.
>>>
>>> I think this feature, that the inode cache is written out after creating snapshot,
>>> was implemented on purpose. Because some i-node IDs are freed after their tree is
>>> committed, and so the newly created snapshot must cache the i-node ID again to
>>> guarantee the inode cache is right, even though we write out the inode cache of
>>> the trees before they are snapshotted. So it is unnecessary to make the inode cache
>>> be written out before creating snapshot.
>>>
>>
>> When opening the newly created snapshot, orphan cleanup will find these
>> freed-after-commited inodes and update the inode cache. So technically,
>> rescan is not required.
>
> Not orphan inode IDs.
> The inode IDs in the free_ino_pinned tree are also freed after the fs/file tree commit.
>
Any reason free_ino_pinned is required?
>>
>>> Li, am I right?
>>>
>>> Thanks
>>> Miao
>>>
>>>>
>>>>> This fixes things by making sure we force COW the block after we create a
>>>>> snapshot during commiting a transaction, then any changes to the roots
>>>>> will result in COW, and we get all the fs roots and snapshot roots to be
>>>>> consistent.
>>>>>
>>>>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>>>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>>>> ---
>>>>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>>>>> fs/btrfs/ctree.h | 2 ++
>>>>> fs/btrfs/transaction.c | 8 ++++++++
>>>>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>>> index 011cab3..49dad7d 100644
>>>>> --- a/fs/btrfs/ctree.c
>>>>> +++ b/fs/btrfs/ctree.c
>>>>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>>>>> struct btrfs_root *root,
>>>>> struct extent_buffer *buf)
>>>>> {
>>>>> + /* ensure we can see the force_cow */
>>>>> + smp_rmb();
>>>>> +
>>>>> + /*
>>>>> + * We do not need to cow a block if
>>>>> + * 1) this block is not created or changed in this transaction;
>>>>> + * 2) this block does not belong to TREE_RELOC tree;
>>>>> + * 3) the root is not forced COW.
>>>>> + *
>>>>> + * What is forced COW:
>>>>> + * when we create snapshot during commiting the transaction,
>>>>> + * after we've finished coping src root, we must COW the shared
>>>>> + * block to ensure the metadata consistency.
>>>>> + */
>>>>> if (btrfs_header_generation(buf) == trans->transid &&
>>>>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>>>>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>>>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>>>>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>>>>> + !root->force_cow)
>>>>> return 0;
>>>>> return 1;
>>>>> }
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index 03912c5..bece0df 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>>>>> * for stat. It may be used for more later
>>>>> */
>>>>> dev_t anon_dev;
>>>>> +
>>>>> + int force_cow;
>>>>> };
>>>>>
>>>>> struct btrfs_ioctl_defrag_range_args {
>>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>>> index 7dc36fa..bf6e2b3 100644
>>>>> --- a/fs/btrfs/transaction.c
>>>>> +++ b/fs/btrfs/transaction.c
>>>>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>>>>
>>>>> btrfs_save_ino_cache(root, trans);
>>>>>
>>>>> + /* see comments in should_cow_block() */
>>>>> + root->force_cow = 0;
>>>>> + smp_wmb();
>>>>> +
>>>>> if (root->commit_root != root->node) {
>>>>> mutex_lock(&root->fs_commit_mutex);
>>>>> switch_commit_root(root);
>>>>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>>>> btrfs_tree_unlock(old);
>>>>> free_extent_buffer(old);
>>>>>
>>>>> + /* see comments in should_cow_block() */
>>>>> + root->force_cow = 1;
>>>>> + smp_wmb();
>>>>> +
>>>>> btrfs_set_root_node(new_root_item, tmp);
>>>>> /* record when the snapshot was created in key.offset */
>>>>> key.offset = trans->transid;
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 4:25 ` Yan, Zheng
2011-09-29 6:47 ` Miao Xie
@ 2011-09-29 8:18 ` Liu Bo
2011-09-29 8:36 ` Yan, Zheng
1 sibling, 1 reply; 11+ messages in thread
From: Liu Bo @ 2011-09-29 8:18 UTC (permalink / raw)
To: Yan, Zheng; +Cc: linux-btrfs, josef, chris.mason, lizf, miaox, dave
On 09/29/2011 12:25 PM, Yan, Zheng wrote:
> On 09/29/2011 10:00 AM, Liu Bo wrote:
>> The btrfs snapshotting code requires that once a root has been
>> snapshotted, we don't change it during a commit.
>>
>> But there are two cases to lead to tree corruptions:
>>
>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>> and this may change the src root when processing the following pending
>> snapshots, which lead to the former snapshots corruptions;
>>
>> 2) the free inode cache was changing the roots when it root the cache,
>> which lead to corruptions.
>>
> For the case 2, the free inode cache of newly created snapshot is invalid.
> So it's better to avoid modifying snapshotted trees.
>
For case 2, with flushing dirty inode cache during create_pending_snapshot,
we can avoid modifying snapshotted trees as your advice.
But for case 1, I have no idea how to do the same thing, since we are not
allowed to commit per snapshot, which will make the performance terrible.
thanks,
liubo
>> This fixes things by making sure we force COW the block after we create a
>> snapshot during commiting a transaction, then any changes to the roots
>> will result in COW, and we get all the fs roots and snapshot roots to be
>> consistent.
>>
>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>> fs/btrfs/ctree.h | 2 ++
>> fs/btrfs/transaction.c | 8 ++++++++
>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 011cab3..49dad7d 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>> struct btrfs_root *root,
>> struct extent_buffer *buf)
>> {
>> + /* ensure we can see the force_cow */
>> + smp_rmb();
>> +
>> + /*
>> + * We do not need to cow a block if
>> + * 1) this block is not created or changed in this transaction;
>> + * 2) this block does not belong to TREE_RELOC tree;
>> + * 3) the root is not forced COW.
>> + *
>> + * What is forced COW:
>> + * when we create snapshot during commiting the transaction,
>> + * after we've finished coping src root, we must COW the shared
>> + * block to ensure the metadata consistency.
>> + */
>> if (btrfs_header_generation(buf) == trans->transid &&
>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>> + !root->force_cow)
>> return 0;
>> return 1;
>> }
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 03912c5..bece0df 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>> * for stat. It may be used for more later
>> */
>> dev_t anon_dev;
>> +
>> + int force_cow;
>> };
>>
>> struct btrfs_ioctl_defrag_range_args {
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 7dc36fa..bf6e2b3 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>
>> btrfs_save_ino_cache(root, trans);
>>
>> + /* see comments in should_cow_block() */
>> + root->force_cow = 0;
>> + smp_wmb();
>> +
>> if (root->commit_root != root->node) {
>> mutex_lock(&root->fs_commit_mutex);
>> switch_commit_root(root);
>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> btrfs_tree_unlock(old);
>> free_extent_buffer(old);
>>
>> + /* see comments in should_cow_block() */
>> + root->force_cow = 1;
>> + smp_wmb();
>> +
>> btrfs_set_root_node(new_root_item, tmp);
>> /* record when the snapshot was created in key.offset */
>> key.offset = trans->transid;
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 8:18 ` Liu Bo
@ 2011-09-29 8:36 ` Yan, Zheng
2011-09-29 8:40 ` Arne Jansen
0 siblings, 1 reply; 11+ messages in thread
From: Yan, Zheng @ 2011-09-29 8:36 UTC (permalink / raw)
To: Liu Bo; +Cc: linux-btrfs, josef, chris.mason, lizf, miaox, dave
On 09/29/2011 04:18 PM, Liu Bo wrote:
> On 09/29/2011 12:25 PM, Yan, Zheng wrote:
>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>> The btrfs snapshotting code requires that once a root has been
>>> snapshotted, we don't change it during a commit.
>>>
>>> But there are two cases to lead to tree corruptions:
>>>
>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>> and this may change the src root when processing the following pending
>>> snapshots, which lead to the former snapshots corruptions;
>>>
>>> 2) the free inode cache was changing the roots when it root the cache,
>>> which lead to corruptions.
>>>
>> For the case 2, the free inode cache of newly created snapshot is invalid.
>> So it's better to avoid modifying snapshotted trees.
>>
>
> For case 2, with flushing dirty inode cache during create_pending_snapshot,
> we can avoid modifying snapshotted trees as your advice.
>
> But for case 1, I have no idea how to do the same thing, since we are not
> allowed to commit per snapshot, which will make the performance terrible.
>
I think commit per snapshot is acceptable. If you want better solution, build
a dependency graph. http://en.wikipedia.org/wiki/Dependency_graph
> thanks,
> liubo
>
>
>>> This fixes things by making sure we force COW the block after we create a
>>> snapshot during commiting a transaction, then any changes to the roots
>>> will result in COW, and we get all the fs roots and snapshot roots to be
>>> consistent.
>>>
>>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>>> fs/btrfs/ctree.h | 2 ++
>>> fs/btrfs/transaction.c | 8 ++++++++
>>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index 011cab3..49dad7d 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>>> struct btrfs_root *root,
>>> struct extent_buffer *buf)
>>> {
>>> + /* ensure we can see the force_cow */
>>> + smp_rmb();
>>> +
>>> + /*
>>> + * We do not need to cow a block if
>>> + * 1) this block is not created or changed in this transaction;
>>> + * 2) this block does not belong to TREE_RELOC tree;
>>> + * 3) the root is not forced COW.
>>> + *
>>> + * What is forced COW:
>>> + * when we create snapshot during commiting the transaction,
>>> + * after we've finished coping src root, we must COW the shared
>>> + * block to ensure the metadata consistency.
>>> + */
>>> if (btrfs_header_generation(buf) == trans->transid &&
>>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>>> + !root->force_cow)
>>> return 0;
>>> return 1;
>>> }
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 03912c5..bece0df 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>>> * for stat. It may be used for more later
>>> */
>>> dev_t anon_dev;
>>> +
>>> + int force_cow;
>>> };
>>>
>>> struct btrfs_ioctl_defrag_range_args {
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 7dc36fa..bf6e2b3 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>>
>>> btrfs_save_ino_cache(root, trans);
>>>
>>> + /* see comments in should_cow_block() */
>>> + root->force_cow = 0;
>>> + smp_wmb();
>>> +
>>> if (root->commit_root != root->node) {
>>> mutex_lock(&root->fs_commit_mutex);
>>> switch_commit_root(root);
>>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>> btrfs_tree_unlock(old);
>>> free_extent_buffer(old);
>>>
>>> + /* see comments in should_cow_block() */
>>> + root->force_cow = 1;
>>> + smp_wmb();
>>> +
>>> btrfs_set_root_node(new_root_item, tmp);
>>> /* record when the snapshot was created in key.offset */
>>> key.offset = trans->transid;
>>
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 8:36 ` Yan, Zheng
@ 2011-09-29 8:40 ` Arne Jansen
2011-09-29 14:59 ` Chris Mason
0 siblings, 1 reply; 11+ messages in thread
From: Arne Jansen @ 2011-09-29 8:40 UTC (permalink / raw)
To: Yan, Zheng; +Cc: Liu Bo, linux-btrfs, josef, chris.mason, lizf, miaox, dave
On 29.09.2011 10:36, Yan, Zheng wrote:
> On 09/29/2011 04:18 PM, Liu Bo wrote:
>> On 09/29/2011 12:25 PM, Yan, Zheng wrote:
>>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>>> The btrfs snapshotting code requires that once a root has been
>>>> snapshotted, we don't change it during a commit.
>>>>
>>>> But there are two cases to lead to tree corruptions:
>>>>
>>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>>> and this may change the src root when processing the following pending
>>>> snapshots, which lead to the former snapshots corruptions;
>>>>
>>>> 2) the free inode cache was changing the roots when it root the cache,
>>>> which lead to corruptions.
>>>>
>>> For the case 2, the free inode cache of newly created snapshot is invalid.
>>> So it's better to avoid modifying snapshotted trees.
>>>
>>
>> For case 2, with flushing dirty inode cache during create_pending_snapshot,
>> we can avoid modifying snapshotted trees as your advice.
>>
>> But for case 1, I have no idea how to do the same thing, since we are not
>> allowed to commit per snapshot, which will make the performance terrible.
One snapshot per subvol per transaction is ok, but it must be possible to create
hundreds or thousands of snapshots for different subvols within one transaction.
Imagine a setup with 10000 subvols and creating one snapshot per hour.
-Arne
>>
> I think commit per snapshot is acceptable. If you want better solution, build
> a dependency graph. http://en.wikipedia.org/wiki/Dependency_graph
>
>> thanks,
>> liubo
>>
>>
>>>> This fixes things by making sure we force COW the block after we create a
>>>> snapshot during commiting a transaction, then any changes to the roots
>>>> will result in COW, and we get all the fs roots and snapshot roots to be
>>>> consistent.
>>>>
>>>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>>> ---
>>>> fs/btrfs/ctree.c | 17 ++++++++++++++++-
>>>> fs/btrfs/ctree.h | 2 ++
>>>> fs/btrfs/transaction.c | 8 ++++++++
>>>> 3 files changed, 26 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>> index 011cab3..49dad7d 100644
>>>> --- a/fs/btrfs/ctree.c
>>>> +++ b/fs/btrfs/ctree.c
>>>> @@ -514,10 +514,25 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
>>>> struct btrfs_root *root,
>>>> struct extent_buffer *buf)
>>>> {
>>>> + /* ensure we can see the force_cow */
>>>> + smp_rmb();
>>>> +
>>>> + /*
>>>> + * We do not need to cow a block if
>>>> + * 1) this block is not created or changed in this transaction;
>>>> + * 2) this block does not belong to TREE_RELOC tree;
>>>> + * 3) the root is not forced COW.
>>>> + *
>>>> + * What is forced COW:
>>>> + * when we create snapshot during commiting the transaction,
>>>> + * after we've finished coping src root, we must COW the shared
>>>> + * block to ensure the metadata consistency.
>>>> + */
>>>> if (btrfs_header_generation(buf) == trans->transid &&
>>>> !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) &&
>>>> !(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>>> - btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
>>>> + btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)) &&
>>>> + !root->force_cow)
>>>> return 0;
>>>> return 1;
>>>> }
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index 03912c5..bece0df 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -1225,6 +1225,8 @@ struct btrfs_root {
>>>> * for stat. It may be used for more later
>>>> */
>>>> dev_t anon_dev;
>>>> +
>>>> + int force_cow;
>>>> };
>>>>
>>>> struct btrfs_ioctl_defrag_range_args {
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 7dc36fa..bf6e2b3 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -816,6 +816,10 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
>>>>
>>>> btrfs_save_ino_cache(root, trans);
>>>>
>>>> + /* see comments in should_cow_block() */
>>>> + root->force_cow = 0;
>>>> + smp_wmb();
>>>> +
>>>> if (root->commit_root != root->node) {
>>>> mutex_lock(&root->fs_commit_mutex);
>>>> switch_commit_root(root);
>>>> @@ -976,6 +980,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>>> btrfs_tree_unlock(old);
>>>> free_extent_buffer(old);
>>>>
>>>> + /* see comments in should_cow_block() */
>>>> + root->force_cow = 1;
>>>> + smp_wmb();
>>>> +
>>>> btrfs_set_root_node(new_root_item, tmp);
>>>> /* record when the snapshot was created in key.offset */
>>>> key.offset = trans->transid;
>>>
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 8:40 ` Arne Jansen
@ 2011-09-29 14:59 ` Chris Mason
2011-10-27 3:09 ` Liu Bo
0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2011-09-29 14:59 UTC (permalink / raw)
To: Arne Jansen; +Cc: Yan, Zheng, Liu Bo, linux-btrfs, josef, lizf, miaox, dave
Excerpts from Arne Jansen's message of 2011-09-29 04:40:30 -0400:
> On 29.09.2011 10:36, Yan, Zheng wrote:
> > On 09/29/2011 04:18 PM, Liu Bo wrote:
> >> On 09/29/2011 12:25 PM, Yan, Zheng wrote:
> >>> On 09/29/2011 10:00 AM, Liu Bo wrote:
> >>>> The btrfs snapshotting code requires that once a root has been
> >>>> snapshotted, we don't change it during a commit.
> >>>>
> >>>> But there are two cases to lead to tree corruptions:
> >>>>
> >>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
> >>>> and this may change the src root when processing the following pending
> >>>> snapshots, which lead to the former snapshots corruptions;
> >>>>
> >>>> 2) the free inode cache was changing the roots when it root the cache,
> >>>> which lead to corruptions.
> >>>>
> >>> For the case 2, the free inode cache of newly created snapshot is invalid.
> >>> So it's better to avoid modifying snapshotted trees.
> >>>
> >>
> >> For case 2, with flushing dirty inode cache during create_pending_snapshot,
> >> we can avoid modifying snapshotted trees as your advice.
> >>
> >> But for case 1, I have no idea how to do the same thing, since we are not
> >> allowed to commit per snapshot, which will make the performance terrible.
>
> One snapshot per subvol per transaction is ok, but it must be possible to create
> hundreds or thousands of snapshots for different subvols within one transaction.
> Imagine a setup with 10000 subvols and creating one snapshot per hour.
Agreed, we need to be able to do more than one snapshot per commit. Our
current commits are pretty heavy, we do need to be able to batch them.
-chris
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush
2011-09-29 14:59 ` Chris Mason
@ 2011-10-27 3:09 ` Liu Bo
0 siblings, 0 replies; 11+ messages in thread
From: Liu Bo @ 2011-10-27 3:09 UTC (permalink / raw)
To: Chris Mason
Cc: Arne Jansen, Yan, Zheng, linux-btrfs, josef, lizf, miaox, dave
On 09/29/2011 10:59 PM, Chris Mason wrote:
> Excerpts from Arne Jansen's message of 2011-09-29 04:40:30 -0400:
>> On 29.09.2011 10:36, Yan, Zheng wrote:
>>> On 09/29/2011 04:18 PM, Liu Bo wrote:
>>>> On 09/29/2011 12:25 PM, Yan, Zheng wrote:
>>>>> On 09/29/2011 10:00 AM, Liu Bo wrote:
>>>>>> The btrfs snapshotting code requires that once a root has been
>>>>>> snapshotted, we don't change it during a commit.
>>>>>>
>>>>>> But there are two cases to lead to tree corruptions:
>>>>>>
>>>>>> 1) multi-thread snapshots can commit serveral snapshots in a transaction,
>>>>>> and this may change the src root when processing the following pending
>>>>>> snapshots, which lead to the former snapshots corruptions;
>>>>>>
>>>>>> 2) the free inode cache was changing the roots when it root the cache,
>>>>>> which lead to corruptions.
>>>>>>
>>>>> For the case 2, the free inode cache of newly created snapshot is invalid.
>>>>> So it's better to avoid modifying snapshotted trees.
>>>>>
>>>> For case 2, with flushing dirty inode cache during create_pending_snapshot,
>>>> we can avoid modifying snapshotted trees as your advice.
>>>>
>>>> But for case 1, I have no idea how to do the same thing, since we are not
>>>> allowed to commit per snapshot, which will make the performance terrible.
>> One snapshot per subvol per transaction is ok, but it must be possible to create
>> hundreds or thousands of snapshots for different subvols within one transaction.
>> Imagine a setup with 10000 subvols and creating one snapshot per hour.
>
> Agreed, we need to be able to do more than one snapshot per commit. Our
> current commits are pretty heavy, we do need to be able to batch them.
>
Hi, Chris,
Does this "force cow" way fit your expectation? Or we need to pick another solution,
such as building a a dependency graph among snapshots?
IMO, apart from COWing more while committing a transaction, "force cow" keeps the original
snapshot batching.
thanks,
liubo
> -chris
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-10-27 3:09 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-29 2:00 [PATCH] Btrfs: fix tree corruption after multi-thread snapshots and inode cache flush Liu Bo
2011-09-29 4:25 ` Yan, Zheng
2011-09-29 6:47 ` Miao Xie
2011-09-29 6:46 ` Yan, Zheng
2011-09-29 7:19 ` Miao Xie
2011-09-29 7:09 ` Yan, Zheng
2011-09-29 8:18 ` Liu Bo
2011-09-29 8:36 ` Yan, Zheng
2011-09-29 8:40 ` Arne Jansen
2011-09-29 14:59 ` Chris Mason
2011-10-27 3:09 ` Liu Bo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).