From: Miao Xie <miaox@cn.fujitsu.com>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: Josef Bacik <jbacik@fusionio.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 2/5] Btrfs: fix trans block rsv regression
Date: Sun, 23 Sep 2012 18:08:53 +0800 [thread overview]
Message-ID: <505EDFB5.7040902@cn.fujitsu.com> (raw)
In-Reply-To: <50532B7D.5060906@oracle.com>
On fri, 14 Sep 2012 21:05:01 +0800, Liu Bo wrote:
> On 09/14/2012 09:01 PM, Liu Bo wrote:
>> On 09/14/2012 08:41 PM, Josef Bacik wrote:
>>> On Fri, Sep 14, 2012 at 02:58:04AM -0600, Liu Bo wrote:
>>>> In some workloads we have nested joining transaction operations,
>>>> eg.
>>>> run_delalloc_nocow
>>>> btrfs_join_transaction
>>>> cow_file_range
>>>> btrfs_join_transaction
>>>>
>>>> it can be a serious bug since each trans handler has only two
>>>> block_rsv, orig_rsv and block_rsv, which means we may lose our
>>>> first block_rsv after two joining transaction operations:
>>>>
>>>> 1) btrfs_start_transaction
>>>> trans->block_rsv = A
>>>>
>>>> 2) btrfs_join_transaction
>>>> trans->orig_rsv = trans->block_rsv; ---> orig_rsv is now A
>>>> trans->block_rsv = B
>>>>
>>>> 3) btrfs_join_transaction
>>>> trans->orig_rsv = trans->block_rsv; ---> orig_rsv is now B
>>>> trans->block_rsv = C
>>>> ...
>>>>
>>>
>>> I'd like to see the actual stack trace where this happens, because I don't think
>>> it can happen. And if it is we need to look at that specific case and adjust it
>>> as necessary and not add a bunch of kmallocs just to track the block_rsv,
>>> because frankly it's not that big of a deal, it was just put into place in case
>>> somebody wasn't expecting a call they made to start another transaction and
>>> reset the block_rsv, which I don't actually think happens anywhere. So NAK on
>>> this patch, give me more information so I can figure out the right way to deal
>>> with this. Thanks,
>>>
>>
>> Fine, please run xfstests 068 till it hits a BUG_ON inside either btrfs_delete_delayed_dir_index or
>> btrfs_insert_delayed_dir_index.
>>
>> What I saw is that the orig_rsv and block_rsv is both delalloc_block_rsv, which is already lack of space.
>>
>
> and trans->use_count has been 3.
Hi, Liu
Do you still look into this problem? I think the following patch can help you.
This patch was made to improve btrfs_run_ordered_operations(), I found it can fix
the problem that you pointed out in this mail.
Thanks
Miao
Subject: [PATCH] Btrfs: make ordered operations be handled by multi-thread
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/disk-io.c | 7 ++++
fs/btrfs/ordered-data.c | 88 ++++++++++++++++++++++++++++++++++++++--------
fs/btrfs/ordered-data.h | 2 +-
fs/btrfs/transaction.c | 18 +++++++--
5 files changed, 95 insertions(+), 21 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dbb461f..fd7ed9f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1328,6 +1328,7 @@ struct btrfs_fs_info {
*/
struct btrfs_workers fixup_workers;
struct btrfs_workers delayed_workers;
+ struct btrfs_workers ordered_operation_workers;
struct task_struct *transaction_kthread;
struct task_struct *cleaner_kthread;
int thread_pool_size;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7fb7069..e49665f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2317,6 +2317,10 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(&fs_info->readahead_workers, "readahead",
fs_info->thread_pool_size,
&fs_info->generic_worker);
+ btrfs_init_workers(&fs_info->ordered_operation_workers,
+ "ordered-operations",
+ fs_info->thread_pool_size,
+ &fs_info->generic_worker);
/*
* endios are largely parallel and should have a very
@@ -2346,6 +2350,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(&fs_info->delayed_workers);
ret |= btrfs_start_workers(&fs_info->caching_workers);
ret |= btrfs_start_workers(&fs_info->readahead_workers);
+ ret |= btrfs_start_workers(&fs_info->ordered_operation_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
@@ -2649,6 +2654,7 @@ fail_tree_roots:
fail_sb_buffer:
btrfs_stop_workers(&fs_info->generic_worker);
+ btrfs_stop_workers(&fs_info->ordered_operation_workers);
btrfs_stop_workers(&fs_info->readahead_workers);
btrfs_stop_workers(&fs_info->fixup_workers);
btrfs_stop_workers(&fs_info->delalloc_workers);
@@ -3256,6 +3262,7 @@ int close_ctree(struct btrfs_root *root)
btrfs_stop_workers(&fs_info->delayed_workers);
btrfs_stop_workers(&fs_info->caching_workers);
btrfs_stop_workers(&fs_info->readahead_workers);
+ btrfs_stop_workers(&fs_info->ordered_operation_workers);
#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
if (btrfs_test_opt(root, CHECK_INTEGRITY))
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 4ae1014..a4b1316 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -26,6 +26,7 @@
#include "extent_io.h"
static struct kmem_cache *btrfs_ordered_extent_cache;
+static struct kmem_cache *btrfs_ordered_operation_work_cache;
static u64 entry_end(struct btrfs_ordered_extent *entry)
{
@@ -519,6 +520,28 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root,
spin_unlock(&root->fs_info->ordered_extent_lock);
}
+struct btrfs_ordered_operation_work {
+ struct inode *inode;
+ struct completion completion;
+ int wait;
+ struct list_head list;
+ struct btrfs_work work;
+};
+
+static void btrfs_run_ordered_operation(struct btrfs_work *work)
+{
+ struct btrfs_ordered_operation_work *ordered_work;
+
+ ordered_work = container_of(work, struct btrfs_ordered_operation_work,
+ work);
+ if (ordered_work->wait)
+ btrfs_wait_ordered_range(ordered_work->inode, 0, (u64)-1);
+ else
+ filemap_flush(ordered_work->inode->i_mapping);
+ btrfs_add_delayed_iput(ordered_work->inode);
+ complete(&ordered_work->completion);
+}
+
/*
* this is used during transaction commit to write all the inodes
* added to the ordered operation list. These files must be fully on
@@ -529,13 +552,17 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root,
* extra check to make sure the ordered operation list really is empty
* before we return
*/
-void btrfs_run_ordered_operations(struct btrfs_root *root, int wait)
+int btrfs_run_ordered_operations(struct btrfs_root *root, int wait)
{
struct btrfs_inode *btrfs_inode;
struct inode *inode;
struct list_head splice;
+ struct list_head works;
+ struct btrfs_ordered_operation_work *work, *next;
+ int ret = 0;
INIT_LIST_HEAD(&splice);
+ INIT_LIST_HEAD(&works);
mutex_lock(&root->fs_info->ordered_operations_mutex);
spin_lock(&root->fs_info->ordered_extent_lock);
@@ -543,6 +570,7 @@ again:
list_splice_init(&root->fs_info->ordered_operations, &splice);
while (!list_empty(&splice)) {
+
btrfs_inode = list_entry(splice.next, struct btrfs_inode,
ordered_operations);
@@ -559,16 +587,34 @@ again:
list_add_tail(&BTRFS_I(inode)->ordered_operations,
&root->fs_info->ordered_operations);
}
+
+ if (!inode)
+ continue;
spin_unlock(&root->fs_info->ordered_extent_lock);
- if (inode) {
- if (wait)
- btrfs_wait_ordered_range(inode, 0, (u64)-1);
- else
- filemap_flush(inode->i_mapping);
- btrfs_add_delayed_iput(inode);
+ work = kmem_cache_zalloc(btrfs_ordered_operation_work_cache,
+ GFP_NOFS);
+ if (!work) {
+ if (list_empty(&BTRFS_I(inode)->ordered_operations))
+ list_add_tail(&btrfs_inode->ordered_operations,
+ &splice);
+ spin_lock(&root->fs_info->ordered_extent_lock);
+ list_splice_tail(&splice,
+ &root->fs_info->ordered_operations);
+ spin_unlock(&root->fs_info->ordered_extent_lock);
+ ret = -ENOMEM;
+ goto out;
}
+ init_completion(&work->completion);
+ INIT_LIST_HEAD(&work->list);
+ work->inode = inode;
+ work->wait = wait;
+ work->work.func = btrfs_run_ordered_operation;
+ list_add_tail(&work->list, &works);
+ btrfs_queue_worker(&root->fs_info->ordered_operation_workers,
+ &work->work);
+
cond_resched();
spin_lock(&root->fs_info->ordered_extent_lock);
}
@@ -576,7 +622,14 @@ again:
goto again;
spin_unlock(&root->fs_info->ordered_extent_lock);
+out:
+ list_for_each_entry_safe(work, next, &works, list) {
+ list_del_init(&work->list);
+ wait_for_completion(&work->completion);
+ kmem_cache_free(btrfs_ordered_operation_work_cache, work);
+ }
mutex_unlock(&root->fs_info->ordered_operations_mutex);
+ return ret;
}
/*
@@ -944,15 +997,6 @@ void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans,
if (last_mod < root->fs_info->last_trans_committed)
return;
- /*
- * the transaction is already committing. Just start the IO and
- * don't bother with all of this list nonsense
- */
- if (trans && root->fs_info->running_transaction->blocked) {
- btrfs_wait_ordered_range(inode, 0, (u64)-1);
- return;
- }
-
spin_lock(&root->fs_info->ordered_extent_lock);
if (list_empty(&BTRFS_I(inode)->ordered_operations)) {
list_add_tail(&BTRFS_I(inode)->ordered_operations,
@@ -969,6 +1013,16 @@ int __init ordered_data_init(void)
NULL);
if (!btrfs_ordered_extent_cache)
return -ENOMEM;
+
+ btrfs_ordered_operation_work_cache = kmem_cache_create(
+ "btrfs_ordered_operation_work",
+ sizeof(struct btrfs_ordered_operation_work), 0,
+ SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+ NULL);
+ if (!btrfs_ordered_operation_work_cache) {
+ kmem_cache_destroy(btrfs_ordered_extent_cache);
+ return -ENOMEM;
+ }
return 0;
}
@@ -976,4 +1030,6 @@ void ordered_data_exit(void)
{
if (btrfs_ordered_extent_cache)
kmem_cache_destroy(btrfs_ordered_extent_cache);
+ if (btrfs_ordered_operation_work_cache)
+ kmem_cache_destroy(btrfs_ordered_operation_work_cache);
}
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index d1ddaef..82f9f67 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -186,7 +186,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode,
int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
struct btrfs_ordered_extent *ordered);
int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr, u32 *sum);
-void btrfs_run_ordered_operations(struct btrfs_root *root, int wait);
+int btrfs_run_ordered_operations(struct btrfs_root *root, int wait);
void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *inode);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 115f054..1538b50 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1362,15 +1362,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
struct btrfs_transaction *cur_trans = trans->transaction;
struct btrfs_transaction *prev_trans = NULL;
DEFINE_WAIT(wait);
- int ret = -EIO;
+ int ret;
int should_grow = 0;
unsigned long now = get_seconds();
int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT);
- btrfs_run_ordered_operations(root, 0);
+ ret = btrfs_run_ordered_operations(root, 0);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ goto cleanup_transaction;
+ }
- if (cur_trans->aborted)
+ if (cur_trans->aborted) {
+ ret = cur_trans->aborted;
goto cleanup_transaction;
+ }
/* make a pass through all the delayed refs we have so far
* any runnings procs may add more while we are here
@@ -1466,7 +1472,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
* it here and no for sure that nothing new will be added
* to the list
*/
- btrfs_run_ordered_operations(root, 1);
+ ret = btrfs_run_ordered_operations(root, 1);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ goto cleanup_transaction;
+ }
prepare_to_wait(&cur_trans->writer_wait, &wait,
TASK_UNINTERRUPTIBLE);
--
1.7.6.5
next prev parent reply other threads:[~2012-09-23 10:08 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-14 8:58 [PATCH 1/5] Btrfs: fix deadlock with freeze and sync Liu Bo
2012-09-14 8:58 ` [PATCH 2/5] Btrfs: fix trans block rsv regression Liu Bo
2012-09-14 11:15 ` David Sterba
2012-09-14 11:25 ` Liu Bo
2012-09-14 12:07 ` David Sterba
2012-09-14 12:16 ` Liu Bo
2012-09-14 12:41 ` Josef Bacik
2012-09-14 13:01 ` Liu Bo
2012-09-14 13:05 ` Liu Bo
2012-09-23 10:08 ` Miao Xie [this message]
2012-09-14 13:26 ` David Sterba
2012-09-14 8:58 ` [PATCH 3/5] Btrfs: cleanup for duplicated code in find_free_extent Liu Bo
2012-09-14 11:36 ` David Sterba
2012-09-14 8:58 ` [PATCH 4/5] Btrfs: cleanup fs_info->hashers Liu Bo
2012-09-14 11:21 ` David Sterba
2012-09-14 8:58 ` [PATCH 5/5] Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents Liu Bo
2012-09-14 12:45 ` Josef Bacik
2012-09-14 12:55 ` Liu Bo
2012-09-14 13:01 ` Josef Bacik
2012-09-14 13:09 ` David Sterba
2012-09-14 10:07 ` [PATCH 1/5] Btrfs: fix deadlock with freeze and sync Miao Xie
2012-09-14 11:30 ` Liu Bo
2012-09-14 12:42 ` Josef Bacik
2012-09-14 13:03 ` Liu Bo
2012-09-14 14:41 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=505EDFB5.7040902@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=bo.li.liu@oracle.com \
--cc=jbacik@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.