Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 1/2] btrfs: immediately drop extent maps after failed COW write
Date: Thu, 16 May 2024 07:32:52 +0930	[thread overview]
Message-ID: <879bf1fa-14ce-44b1-93ba-258701ebab85@gmx.com> (raw)
In-Reply-To: <c9d7a03ee9730e1d864cb6fbe2d511dd8899a953.1715798440.git.fdmanana@suse.com>



在 2024/5/16 04:21, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> If a write path in COW mode fails, either before submitting a bio for the
> new extents or an actual IO error happens, we can end up allowing a fast
> fsync to log file extent items that point to unwritten extents.
>
> This is because the ordered extent completion for a failed write is
> executed in a work queue. This means that once the write path unlocks the
> inode, a fast fsync can come and log the extent maps created by the write
> attempt before the work queue completes the ordered extent.
>
> For example consider a direct IO write, in COW mode, that fails at
> btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
> error:
>
> 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
>     set to false, meaning an error happened;
>
> 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
>     flag;
>
> 3) btrfs_finish_ordered_extent() queues the completion of the ordered
>     extent - so that btrfs_finish_one_ordered() will be executed later in
>     a work queue. That function will drop extents maps in the range when
>     it's executed, since the extent maps point to unwritten locations
>     (signaled by the BTRFS_ORDERED_IOERR flag);
>
> 4) After calling btrfs_finish_ordered_extent() we keep going down the
>     write path and unlock the inode;
>
> 5) After that a fast fsync starts and locks the inode;
>
> 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
>     task sees the extent maps that point to the unwritten locations and
>     logs file extent items based on them - it does not know they are
>     unwritten, and the fast fsync path does not wait for ordered extents
>     to complete in order to reduce latency.
>
> So to fix this make btrfs_finish_ordered_extent() drop the extent maps
> in the range if an error happened for a COW write.
>
> Note that this issues of using extent maps that point to unwritten
> locations can not happen for reads, because in read paths we start by
> locking the extent range and wait for any ordered extents in the range
> to complete before looking for extent maps.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks for the detailed explanation on the issue.

Thanks,
Qu
> ---
>   fs/btrfs/ordered-data.c | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
>
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 304d94f6d29b..3a3f21da6eb7 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -388,6 +388,33 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
>   	ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
>   	spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
>
> +	/*
> +	 * If this is a COW write it means we created new extent maps for the
> +	 * range and they point to an unwritten location if we got an error
> +	 * either before submitting a bio or during IO.
> +	 *
> +	 * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
> +	 * are queuing its completion below. During completion, at
> +	 * btrfs_finish_one_ordered(), we will drop the extent maps for the
> +	 * unwritten extents.
> +	 *
> +	 * However because completion runs in a work queue we can end up
> +	 * unlocking the inode before the ordered extent is completed.
> +	 *
> +	 * That means that a fast fsync can happen before the work queue
> +	 * executes the completion of the ordered extent, and in that case
> +	 * the fsync will use the extent maps that point to unwritten extents,
> +	 * resulting in logging file extent items that point to unwritten
> +	 * locations. Unlike read paths, a fast fsync doesn't wait for ordered
> +	 * extent completion before proceeding (intentional to reduce latency).
> +	 *
> +	 * To be safe drop the new extent maps in the range (if are doing COW)
> +	 * right here before we unlock the inode and allow a fsync to run.
> +	 */
> +	if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
> +		btrfs_drop_extent_map_range(inode, file_offset,
> +					    file_offset + len - 1, false);
> +
>   	if (ret)
>   		btrfs_queue_ordered_fn(ordered);
>   	return ret;

  reply	other threads:[~2024-05-15 22:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-14 14:23 [PATCH 0/2] btrfs: fix a bug in the direct IO write path for COW writes fdmanana
2024-05-14 14:23 ` [PATCH 1/2] btrfs: drop extent maps after failed COW dio write fdmanana
2024-05-14 22:15   ` Qu Wenruo
2024-05-15  9:47     ` Filipe Manana
2024-05-14 14:23 ` [PATCH 2/2] btrfs: refactor btrfs_dio_submit_io() for less nesting and indentation fdmanana
2024-05-14 22:23   ` Qu Wenruo
2024-05-15 18:51 ` [PATCH v2 0/2] btrfs: fix a bug in the direct IO write path for COW writes fdmanana
2024-05-15 18:51   ` [PATCH v2 1/2] btrfs: immediately drop extent maps after failed COW write fdmanana
2024-05-15 22:02     ` Qu Wenruo [this message]
2024-05-15 18:51   ` [PATCH v2 2/2] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
2024-05-15 22:03     ` Qu Wenruo
2024-05-17 16:28   ` [PATCH v2 0/2] btrfs: fix a bug in the direct IO write path for COW writes David Sterba
2024-05-17 16:54     ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=879bf1fa-14ce-44b1-93ba-258701ebab85@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox