From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 1/2] btrfs: immediately drop extent maps after failed COW write
Date: Thu, 16 May 2024 07:32:52 +0930 [thread overview]
Message-ID: <879bf1fa-14ce-44b1-93ba-258701ebab85@gmx.com> (raw)
In-Reply-To: <c9d7a03ee9730e1d864cb6fbe2d511dd8899a953.1715798440.git.fdmanana@suse.com>
在 2024/5/16 04:21, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> If a write path in COW mode fails, either before submitting a bio for the
> new extents or an actual IO error happens, we can end up allowing a fast
> fsync to log file extent items that point to unwritten extents.
>
> This is because the ordered extent completion for a failed write is
> executed in a work queue. This means that once the write path unlocks the
> inode, a fast fsync can come and log the extent maps created by the write
> attempt before the work queue completes the ordered extent.
>
> For example consider a direct IO write, in COW mode, that fails at
> btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
> error:
>
> 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
> set to false, meaning an error happened;
>
> 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
> flag;
>
> 3) btrfs_finish_ordered_extent() queues the completion of the ordered
> extent - so that btrfs_finish_one_ordered() will be executed later in
> a work queue. That function will drop extents maps in the range when
> it's executed, since the extent maps point to unwritten locations
> (signaled by the BTRFS_ORDERED_IOERR flag);
>
> 4) After calling btrfs_finish_ordered_extent() we keep going down the
> write path and unlock the inode;
>
> 5) After that a fast fsync starts and locks the inode;
>
> 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
> task sees the extent maps that point to the unwritten locations and
> logs file extent items based on them - it does not know they are
> unwritten, and the fast fsync path does not wait for ordered extents
> to complete in order to reduce latency.
>
> So to fix this make btrfs_finish_ordered_extent() drop the extent maps
> in the range if an error happened for a COW write.
>
> Note that this issues of using extent maps that point to unwritten
> locations can not happen for reads, because in read paths we start by
> locking the extent range and wait for any ordered extents in the range
> to complete before looking for extent maps.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks for the detailed explanation on the issue.
Thanks,
Qu
> ---
> fs/btrfs/ordered-data.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 304d94f6d29b..3a3f21da6eb7 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -388,6 +388,33 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
> ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
> spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
>
> + /*
> + * If this is a COW write it means we created new extent maps for the
> + * range and they point to an unwritten location if we got an error
> + * either before submitting a bio or during IO.
> + *
> + * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
> + * are queuing its completion below. During completion, at
> + * btrfs_finish_one_ordered(), we will drop the extent maps for the
> + * unwritten extents.
> + *
> + * However because completion runs in a work queue we can end up
> + * unlocking the inode before the ordered extent is completed.
> + *
> + * That means that a fast fsync can happen before the work queue
> + * executes the completion of the ordered extent, and in that case
> + * the fsync will use the extent maps that point to unwritten extents,
> + * resulting in logging file extent items that point to unwritten
> + * locations. Unlike read paths, a fast fsync doesn't wait for ordered
> + * extent completion before proceeding (intentional to reduce latency).
> + *
> + * To be safe drop the new extent maps in the range (if are doing COW)
> + * right here before we unlock the inode and allow a fsync to run.
> + */
> + if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
> + btrfs_drop_extent_map_range(inode, file_offset,
> + file_offset + len - 1, false);
> +
> if (ret)
> btrfs_queue_ordered_fn(ordered);
> return ret;
next prev parent reply other threads:[~2024-05-15 22:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-14 14:23 [PATCH 0/2] btrfs: fix a bug in the direct IO write path for COW writes fdmanana
2024-05-14 14:23 ` [PATCH 1/2] btrfs: drop extent maps after failed COW dio write fdmanana
2024-05-14 22:15 ` Qu Wenruo
2024-05-15 9:47 ` Filipe Manana
2024-05-14 14:23 ` [PATCH 2/2] btrfs: refactor btrfs_dio_submit_io() for less nesting and indentation fdmanana
2024-05-14 22:23 ` Qu Wenruo
2024-05-15 18:51 ` [PATCH v2 0/2] btrfs: fix a bug in the direct IO write path for COW writes fdmanana
2024-05-15 18:51 ` [PATCH v2 1/2] btrfs: immediately drop extent maps after failed COW write fdmanana
2024-05-15 22:02 ` Qu Wenruo [this message]
2024-05-15 18:51 ` [PATCH v2 2/2] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
2024-05-15 22:03 ` Qu Wenruo
2024-05-17 16:28 ` [PATCH v2 0/2] btrfs: fix a bug in the direct IO write path for COW writes David Sterba
2024-05-17 16:54 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=879bf1fa-14ce-44b1-93ba-258701ebab85@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox