From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] btrfs: make btrfs_dirty_inode() to always reserve metadata space
Date: Fri, 8 Jan 2021 13:36:59 +0800 [thread overview]
Message-ID: <20210108053659.87728-1-wqu@suse.com> (raw)
There are several qgroup flush related bugs fixed recently, all of them
are caused by the fact that we can trigger qgroup metadata space
reservation holding a transaction handle.
Thankfully the only situation to trigger above reservation is
btrfs_dirty_inode().
Currently btrfs_dirty_inode() will try join transactio first, then
update the inode.
If btrfs_update_inode() fails with -ENOSPC, then it retry to start
transaction to reserve metadata space.
This not only forces us to reserve metadata space with a transaction
handle hold, but can't handle other errors like -EDQUOT.
This patch will make btrfs_dirty_inode() to call
btrfs_start_transaction() directly without first try joining then
starting, so that in try_flush_qgroup() we won't hold a trans handle.
This will slow down btrfs_dirty_inode() but my fstests doesn't show too
much different for most test cases, thus it may be worthy to skip such
performance "optimization".
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/delayed-inode.c | 41 +++++++---------------------------------
fs/btrfs/inode.c | 11 +----------
fs/btrfs/qgroup.c | 34 +++++++++------------------------
3 files changed, 17 insertions(+), 69 deletions(-)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 70c0340d839c..0f2454de0b2f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -617,42 +617,15 @@ static int btrfs_delayed_inode_reserve_metadata(
num_bytes = btrfs_calc_metadata_size(fs_info, 1);
/*
- * btrfs_dirty_inode will update the inode under btrfs_join_transaction
- * which doesn't reserve space for speed. This is a problem since we
- * still need to reserve space for this update, so try to reserve the
- * space.
+ * All of our caller should have metadata space reserved, either through
+ * btrfs_start_transaction() or have reserved space then
+ * btrfs_join_transaction(), or steal from delalloc_block_rsv.
*
- * Now if src_rsv == delalloc_block_rsv we'll let it just steal since
- * we always reserve enough to update the inode item.
+ * This ensure we have all of our metadata space reserved, without the
+ * need to reserve qgroup metadata space with a transaction hold.
*/
- if (!src_rsv || (!trans->bytes_reserved &&
- src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
- ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
- if (ret < 0)
- return ret;
- ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
- BTRFS_RESERVE_NO_FLUSH);
- /*
- * Since we're under a transaction reserve_metadata_bytes could
- * try to commit the transaction which will make it return
- * EAGAIN to make us stop the transaction we have, so return
- * ENOSPC instead so that btrfs_dirty_inode knows what to do.
- */
- if (ret == -EAGAIN) {
- ret = -ENOSPC;
- btrfs_qgroup_free_meta_prealloc(root, num_bytes);
- }
- if (!ret) {
- node->bytes_reserved = num_bytes;
- trace_btrfs_space_reservation(fs_info,
- "delayed_inode",
- btrfs_ino(inode),
- num_bytes, 1);
- } else {
- btrfs_qgroup_free_meta_prealloc(root, fs_info->nodesize);
- }
- return ret;
- }
+ ASSERT(src_rsv && (trans->bytes_reserved ||
+ src_rsv->type == BTRFS_BLOCK_RSV_DELALLOC));
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, true);
if (!ret) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9c2800fa80c6..da8e1c24ab22 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5911,20 +5911,11 @@ static int btrfs_dirty_inode(struct inode *inode)
if (test_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags))
return 0;
- trans = btrfs_join_transaction(root);
+ trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans))
return PTR_ERR(trans);
ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
- if (ret && ret == -ENOSPC) {
- /* whoops, lets try again with the full transaction */
- btrfs_end_transaction(trans);
- trans = btrfs_start_transaction(root, 1);
- if (IS_ERR(trans))
- return PTR_ERR(trans);
-
- ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
- }
btrfs_end_transaction(trans);
if (BTRFS_I(inode)->delayed_node)
btrfs_balance_delayed_items(fs_info);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 808370ada888..872a5fed9e1a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3535,37 +3535,24 @@ static int try_flush_qgroup(struct btrfs_root *root)
{
struct btrfs_trans_handle *trans;
int ret;
- bool can_commit = true;
/*
- * If current process holds a transaction, we shouldn't flush, as we
- * assume all space reservation happens before a transaction handle is
- * held.
+ * We shouldn't hold any trans handle here. Or we can cause deadlock
+ * when flushing.
*
- * But there are cases like btrfs_delayed_item_reserve_metadata() where
- * we try to reserve space with one transction handle already held.
- * In that case we can't commit transaction, but at least try to end it
- * and hope the started data writes can free some space.
+ * The if branch is for kernel without CONFIG_BTRFS_ASSERT configured.
*/
- if (current->journal_info &&
- current->journal_info != BTRFS_SEND_TRANS_STUB)
- can_commit = false;
+ ASSERT(current->journal_info == NULL ||
+ current->journal_info == BTRFS_SEND_TRANS_STUB);
+ if (unlikely(current->journal_info &&
+ current->journal_info != BTRFS_SEND_TRANS_STUB))
+ return 0;
/*
* We don't want to run flush again and again, so if there is a running
* one, we won't try to start a new flush, but exit directly.
*/
if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
- /*
- * We are already holding a transaction, thus we can block other
- * threads from flushing. So exit right now. This increases
- * the chance of EDQUOT for heavy load and near limit cases.
- * But we can argue that if we're already near limit, EDQUOT is
- * unavoidable anyway.
- */
- if (!can_commit)
- return 0;
-
wait_event(root->qgroup_flush_wait,
!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
return 0;
@@ -3582,10 +3569,7 @@ static int try_flush_qgroup(struct btrfs_root *root)
goto out;
}
- if (can_commit)
- ret = btrfs_commit_transaction(trans);
- else
- ret = btrfs_end_transaction(trans);
+ ret = btrfs_commit_transaction(trans);
out:
clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);
wake_up(&root->qgroup_flush_wait);
--
2.29.2
next reply other threads:[~2021-01-08 5:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 5:36 Qu Wenruo [this message]
2021-01-12 12:19 ` [PATCH] btrfs: make btrfs_dirty_inode() to always reserve metadata space Nikolay Borisov
2021-01-12 12:29 ` Qu Wenruo
2021-01-12 15:24 ` [btrfs] e86bb85b1f: stress-ng.utime.ops_per_sec -70.1% regression kernel test robot
2021-01-13 7:15 ` Qu Wenruo
2021-02-18 15:28 ` [PATCH] btrfs: make btrfs_dirty_inode() to always reserve metadata space Nikolay Borisov
2021-02-19 0:19 ` Qu Wenruo
2021-02-18 16:14 ` Josef Bacik
2021-02-19 0:31 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210108053659.87728-1-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox