From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix deadlock when cloning inline extents and low on available space
Date: Tue, 25 May 2021 18:11:45 +0200 [thread overview]
Message-ID: <20210525161145.GD7604@twin.jikos.cz> (raw)
In-Reply-To: <fe861a6c4f23f3980fcf198bdbd7e92cdc6847b9.1621936270.git.fdmanana@suse.com>
On Tue, May 25, 2021 at 11:05:28AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> There are a few cases where cloning an inline extent requires copying data
> into a page of the destination inode. For these cases we are allocating
> the required data and metadata space while holding a leaf locked. This can
> result in a deadlock when we are low on available space because allocating
> the space may flush delalloc and two deadlock scenarios can happen:
>
> 1) When starting writeback for an inode with a very small dirty range that
> fits in an inline extent, we deadlock during the writeback when trying
> to insert the inline extent, at cow_file_range_inline(), if the extent
> is going to be located in the leaf for which we are already holding a
> read lock;
>
> 2) After successfully starting writeback, for non-inline extent cases,
> the async reclaim thread will hang waiting for an ordered extent to
> complete if the ordered extent completion needs to modify the leaf
> for which the clone task is holding a read lock (for adding or
> replacing file extent items). So the cloning task will wait forever
> on the async reclaim thread to make progress, which in turn is
> waiting for the ordered extent completion which in turn is waiting
> to acquire a write lock on the same leaf.
>
> So fix this by making sure we release the path (and therefore the leaf)
> everytime we need to copy the inline extent's data into a page of the
> destination inode, as by that time we do not need to have the leaf locked.
>
> Fixes: 05a5a7621ce66c ("Btrfs: implement full reflink support for inline extents")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Added to misc-next, thanks.
> ---
> fs/btrfs/reflink.c | 38 ++++++++++++++++++++++----------------
> 1 file changed, 22 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index 06682128d8fa..58ddc7ed9e84 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -207,10 +207,7 @@ static int clone_copy_inline_extent(struct inode *dst,
> * inline extent's data to the page.
> */
> ASSERT(key.offset > 0);
> - ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset,
> - inline_data, size, datal,
> - comp_type);
> - goto out;
> + goto copy_to_page;
> }
> } else if (i_size_read(dst) <= datal) {
> struct btrfs_file_extent_item *ei;
> @@ -226,13 +223,10 @@ static int clone_copy_inline_extent(struct inode *dst,
> BTRFS_FILE_EXTENT_INLINE)
> goto copy_inline_extent;
>
> - ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset,
> - inline_data, size, datal, comp_type);
> - goto out;
> + goto copy_to_page;
> }
>
> copy_inline_extent:
> - ret = 0;
> /*
> * We have no extent items, or we have an extent at offset 0 which may
> * or may not be inlined. All these cases are dealt the same way.
> @@ -244,11 +238,13 @@ static int clone_copy_inline_extent(struct inode *dst,
> * clone. Deal with all these cases by copying the inline extent
> * data into the respective page at the destination inode.
> */
> - ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset,
> - inline_data, size, datal, comp_type);
> - goto out;
> + goto copy_to_page;
> }
>
> + /*
> + * Release path before starting a new transaction so we don't hold locks
> + * that would confuse lockdep.
> + */
> btrfs_release_path(path);
> /*
> * If we end up here it means were copy the inline extent into a leaf
> @@ -285,11 +281,6 @@ static int clone_copy_inline_extent(struct inode *dst,
> ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end);
> out:
> if (!ret && !trans) {
> - /*
> - * Release path before starting a new transaction so we don't
> - * hold locks that would confuse lockdep.
> - */
> - btrfs_release_path(path);
> /*
> * No transaction here means we copied the inline extent into a
> * page of the destination inode.
> @@ -310,6 +301,21 @@ static int clone_copy_inline_extent(struct inode *dst,
> *trans_out = trans;
>
> return ret;
> +
> +copy_to_page:
> + /*
> + * Release our path because we don't need it anymore and also because
> + * copy_inline_to_page() needs to reserve data and metadata, which may
> + * need to flush delalloc when we are low on available space and
> + * therefore cause a deadlock if writeback of an inline extent needs to
> + * write to the same leaf or an ordered extent completion needs to write
> + * to the same leaf.
> + */
> + btrfs_release_path(path);
> +
> + ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset,
> + inline_data, size, datal, comp_type);
> + goto out;
> }
I don't like to see the pattern with the chained labels but in this case
I don't see a better way, so ok.
prev parent reply other threads:[~2021-05-25 16:14 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-25 10:05 [PATCH] btrfs: fix deadlock when cloning inline extents and low on available space fdmanana
2021-05-25 16:11 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210525161145.GD7604@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).