From: Boris Burkov <boris@bur.io>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/3] btrfs: add missing inode updates on each iteration when replacing extents
Date: Mon, 6 Jun 2022 15:11:52 -0700 [thread overview]
Message-ID: <Yp57qB5gjZ1wpnja@zen> (raw)
In-Reply-To: <980e6be197825045a08ad6d463456bc73521e4d4.1654508104.git.fdmanana@suse.com>
On Mon, Jun 06, 2022 at 10:41:18AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When replacing file extents, called during fallocate, hole punching,
> clone and deduplication, we may not be able to replace/drop all the
> target file extent items with a single transaction handle. We may get
> -ENOSPC while doing it, in which case we release the transaction handle,
> balance the dirty pages of the btree inode, flush delayed items and get
> a new transaction handle to operate on what's left of the target range.
>
> By dropping and replacing file extent items we have effectively modified
How can you be sure that you definitely modified it? Is it possible for
btrfs_drop_extents to return ENOSPC without dropping extents?
> the inode, so we should bump its iversion and update its mtime/ctime
> before we update the inode item. This is because if the transaction
> we used for partially modifying the inode gets committed by someone after
> we release it and before we finish the rest of the range, a power failure
> happens, then after mounting the filesystem our inode has an outdated
> iversion and mtime/ctime, corresponding to the values it had before we
> changed it.
>
> So add the missing iversion and mtime/ctime updates.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/ctree.h | 2 ++
> fs/btrfs/file.c | 19 +++++++++++++++++++
> fs/btrfs/inode.c | 1 +
> fs/btrfs/reflink.c | 1 +
> 4 files changed, 23 insertions(+)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 55dee1564e90..737cd59d16b6 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1330,6 +1330,8 @@ struct btrfs_replace_extent_info {
> * existing extent into a file range.
> */
> bool is_new_extent;
> + /* Indicate if we should update the inode's mtime and ctime. */
> + bool update_times;
> /* Meaningful only if is_new_extent is true. */
> int qgroup_reserved;
> /*
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 1fd827b99c1b..29de433b7804 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2803,6 +2803,25 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
> extent_info->file_offset += replace_len;
> }
>
> + /*
> + * We are releasing our handle on the transaction, balance the
> + * dirty pages of the btree inode and flush delayed items, and
> + * then get a new transaction handle, which may now point to a
> + * new transaction in case someone else may have committed the
> + * transaction we used to replace/drop file extent items. So
> + * bump the inode's iversion and update mtime and ctime except
> + * if we are called from a dedupe context. This is because a
> + * power failure/crash may happen after the transaction is
> + * committed and before we finish replacing/dropping all the
> + * file extent items we need.
> + */
> + inode_inc_iversion(&inode->vfs_inode);
> +
> + if (!extent_info || extent_info->update_times) {
> + inode->vfs_inode.i_mtime = current_time(&inode->vfs_inode);
> + inode->vfs_inode.i_ctime = inode->vfs_inode.i_mtime;
> + }
> +
> ret = btrfs_update_inode(trans, root, inode);
> if (ret)
> break;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3ede3e873c2a..ab4ebcb7878c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9907,6 +9907,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
> extent_info.file_offset = file_offset;
> extent_info.extent_buf = (char *)&stack_fi;
> extent_info.is_new_extent = true;
> + extent_info.update_times = true;
> extent_info.qgroup_reserved = qgroup_released;
> extent_info.insertions = 0;
>
> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index 7e3b0aa318c1..977e0d218d79 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -497,6 +497,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
> clone_info.file_offset = new_key.offset;
> clone_info.extent_buf = buf;
> clone_info.is_new_extent = false;
> + clone_info.update_times = !no_time_update;
> ret = btrfs_replace_file_extents(BTRFS_I(inode), path,
> drop_start, new_key.offset + datal - 1,
> &clone_info, &trans);
> --
> 2.35.1
>
next prev parent reply other threads:[~2022-06-06 22:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-06 9:41 [PATCH 0/3] btrfs: a couple bug fixes around reflinks and fallocate fdmanana
2022-06-06 9:41 ` [PATCH 1/3] btrfs: fix race between reflinking and ordered extent completion fdmanana
2022-06-06 21:36 ` Boris Burkov
2022-06-06 9:41 ` [PATCH 2/3] btrfs: add missing inode updates on each iteration when replacing extents fdmanana
2022-06-06 22:11 ` Boris Burkov [this message]
2022-06-07 9:31 ` Filipe Manana
2022-06-07 16:41 ` Boris Burkov
2022-06-06 9:41 ` [PATCH 3/3] btrfs: do not BUG_ON() on failure to migrate space " fdmanana
2022-06-07 16:44 ` Boris Burkov
2022-06-06 20:45 ` [PATCH 0/3] btrfs: a couple bug fixes around reflinks and fallocate David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yp57qB5gjZ1wpnja@zen \
--to=boris@bur.io \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox