From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/2] btrfs: fix deadlock with fsync+fiemap+full sync
Date: Wed, 1 Jun 2022 17:20:40 +0100 [thread overview]
Message-ID: <20220601162040.GA3307738@falcondesktop> (raw)
In-Reply-To: <b069a36b638b96c599b5d31d46d789039341ad9f.1654096526.git.josef@toxicpanda.com>
On Wed, Jun 01, 2022 at 11:16:25AM -0400, Josef Bacik wrote:
> We are hitting the following deadlock in production occasionally
>
> Task 1 Task 2 Task 3 Task 4 Task 5
> fsync(A)
> start trans
> start commit
> falloc(A)
> lock 5m-10m
> start trans
> wait for commit
> fiemap(A)
> lock 0-10m
> wait for 5m-10m
> (have 0-5m locked)
>
> have btrfs_need_log_full_commit
> !full_sync
> wait_ordered_extents
> finish_ordered_io(A)
> lock 0-5m
> DEADLOCK
>
> We have an existing dependency of file extent lock -> transaction.
> However in fsync if we tried to do the fast logging, but then had to
> fall back to committing the transaction, we will be forced to call
> btrfs_wait_ordered_range() to make sure all of our extents are updated.
>
> This creates a dependency of transaction -> file extent lock, because
> btrfs_finish_ordered_io() will need to take the file extent lock in
> order to run the ordered extents.
>
> Fix this by stopping the transaction if we have to do the full commit
> and we attempted to do the fast logging. Then attach to the transaction
> and commit it if we need to.
The subject is confusing, it mentions deadlock with "full fsync". With that
you mean a transaction commit, and not the slow fsync mode (related to the
inode flag BTRFS_INODE_NEEDS_FULL_SYNC). And looking at the diagram and the
code we mention !full_sync, the fast fsync path, so that makes it even more
confusing.
Can we replace "full fsync" with transaction commit (subject)?
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/file.c | 67 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 52 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 1fd827b99c1b..9c799731cc70 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2323,25 +2323,62 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> */
> btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP);
>
> - if (ret != BTRFS_NO_LOG_SYNC) {
> + if (ret == BTRFS_NO_LOG_SYNC) {
> + ret = btrfs_end_transaction(trans);
> + goto out;
> + }
> +
> + /* We successfully logged the inode, attempt to sync the log. */
> + if (!ret) {
> + ret = btrfs_sync_log(trans, root, &ctx);
> if (!ret) {
> - ret = btrfs_sync_log(trans, root, &ctx);
> - if (!ret) {
> - ret = btrfs_end_transaction(trans);
> - goto out;
> - }
> - }
> - if (!full_sync) {
> - ret = btrfs_wait_ordered_range(inode, start, len);
> - if (ret) {
> - btrfs_end_transaction(trans);
> - goto out;
> - }
> + ret = btrfs_end_transaction(trans);
> + goto out;
> }
> - ret = btrfs_commit_transaction(trans);
> - } else {
> + }
> +
> + /*
> + * At this point we need to commit the transaction because we had
> + * btrfs_need_log_full_commit() or some other error.
> + *
> + * If we didn't do a full sync we have to stop the trans handle, wait on
> + * the ordered extents, start it again and commit the transaction. If
> + * we attempt to wait on the ordered extents here we could deadlock with
> + * something like fallocate() that is holding the extent lock trying to
> + * start a transaction while some other thread is trying to commit the
> + * transaction while we (fsync) are currently holding the transaction
> + * open.
> + */
> + if (!full_sync) {
> ret = btrfs_end_transaction(trans);
> + if (ret)
> + goto out;
> + ret = btrfs_wait_ordered_range(inode, start, len);
We could wait only on the ordered extents collected at ctx.ordered_extents,
like this it would avoid the need to flush and wait for any new writes that
may have started after we unlocked the inode.
There's no helper to just wait for a list of ordered extents, so that's
probably why you made it like that, to make it as short as possible.
But still, something to consider even if for later only.
Otherwise it looks good, thanks.
> + if (ret)
> + goto out;
> +
> + /*
> + * This is safe to use here because we're only interested in
> + * making sure the transaction that had the ordered extents is
> + * committed. We aren't waiting on anything past this point,
> + * we're purely getting the transaction and committing it.
> + */
> + trans = btrfs_attach_transaction_barrier(root);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> +
> + /*
> + * We committed the transaction and there's no currently
> + * running transaction, this means everything we care
> + * about made it to disk and we are done.
> + */
> + if (ret == -ENOENT)
> + ret = 0;
> + goto out;
> + }
> }
> +
> + ret = btrfs_commit_transaction(trans);
> out:
> ASSERT(list_empty(&ctx.list));
> err = file_check_and_advance_wb_err(file);
> --
> 2.26.3
>
prev parent reply other threads:[~2022-06-01 16:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-01 15:16 [PATCH 0/2] btrfs: fix deadlock with fsync and full sync Josef Bacik
2022-06-01 15:16 ` [PATCH 1/2] btrfs: make the return value for log syncing consistent Josef Bacik
2022-06-01 16:25 ` Filipe Manana
2022-06-01 15:16 ` [PATCH 2/2] btrfs: fix deadlock with fsync+fiemap+full sync Josef Bacik
2022-06-01 16:20 ` Filipe Manana [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220601162040.GA3307738@falcondesktop \
--to=fdmanana@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox