From: Filipe Manana <fdmanana@kernel.org>
To: Sasha Levin <sashal@kernel.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>,
clm@fb.com, jbacik@fb.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
Date: Tue, 29 Mar 2022 10:59:33 +0100 [thread overview]
Message-ID: <YkLYhad7iX2Bv/j1@debian9.Home> (raw)
In-Reply-To: <20220328194157.1585642-17-sashal@kernel.org>
On Mon, Mar 28, 2022 at 03:41:52PM -0400, Sasha Levin wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> [ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]
>
> When an inode has a last_reflink_trans matching the current transaction,
> we have to take special care when logging its checksums in order to
> avoid getting checksum items with overlapping ranges in a log tree,
> which could result in missing checksums after log replay (more on that
> in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
> checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
> fix corrupt log due to concurrent fsync of inodes with shared extents")).
> We also need to make sure a full fsync will copy all old file extent
> items it finds in modified leaves, because they might have been copied
> from some other inode.
>
> However once we fsync an inode, we don't need to keep paying the price of
> that extra special care in future fsyncs done in the same transaction,
> unless the inode is used for another reflink operation or the full sync
> flag is set on it (truncate, failure to allocate extent maps for holes,
> and other exceptional and infrequent cases).
>
> So after we fsync an inode reset its last_unlink_trans to zero. In case
> another reflink happens, we continue to update the last_reflink_trans of
> the inode, just as before. Also set last_reflink_trans to the generation
> of the last transaction that modified the inode whenever we need to set
> the full sync flag on the inode, just like when we need to load an inode
> from disk after eviction.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
What's the motivation to backport this to stable?
It doesn't fix a bug or any regression, as far as I know at least.
Or is it to make some other backport easier?
Thanks.
> ---
> fs/btrfs/btrfs_inode.h | 30 ++++++++++++++++++++++++++++++
> fs/btrfs/file.c | 7 +++----
> fs/btrfs/inode.c | 12 +++++-------
> fs/btrfs/reflink.c | 5 ++---
> fs/btrfs/tree-log.c | 8 ++++++++
> 5 files changed, 48 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index b3e46aabc3d8..d0b52b106041 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -333,6 +333,36 @@ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
> spin_unlock(&inode->lock);
> }
>
> +/*
> + * Should be called while holding the inode's VFS lock in exclusive mode or in a
> + * context where no one else can access the inode concurrently (during inode
> + * creation or when loading an inode from disk).
> + */
> +static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
> +{
> + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> + /*
> + * The inode may have been part of a reflink operation in the last
> + * transaction that modified it, and then a fsync has reset the
> + * last_reflink_trans to avoid subsequent fsyncs in the same
> + * transaction to do unnecessary work. So update last_reflink_trans
> + * to the last_trans value (we have to be pessimistic and assume a
> + * reflink happened).
> + *
> + * The ->last_trans is protected by the inode's spinlock and we can
> + * have a concurrent ordered extent completion update it. Also set
> + * last_reflink_trans to ->last_trans only if the former is less than
> + * the later, because we can be called in a context where
> + * last_reflink_trans was set to the current transaction generation
> + * while ->last_trans was not yet updated in the current transaction,
> + * and therefore has a lower value.
> + */
> + spin_lock(&inode->lock);
> + if (inode->last_reflink_trans < inode->last_trans)
> + inode->last_reflink_trans = inode->last_trans;
> + spin_unlock(&inode->lock);
> +}
> +
> static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
> {
> bool ret = false;
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a0179cc62913..f38cc706a6cf 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2474,7 +2474,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
> hole_em = alloc_extent_map();
> if (!hole_em) {
> btrfs_drop_extent_cache(inode, offset, end - 1, 0);
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> + btrfs_set_inode_full_sync(inode);
> } else {
> hole_em->start = offset;
> hole_em->len = end - offset;
> @@ -2495,8 +2495,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
> } while (ret == -EEXIST);
> free_extent_map(hole_em);
> if (ret)
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> - &inode->runtime_flags);
> + btrfs_set_inode_full_sync(inode);
> }
>
> return 0;
> @@ -2850,7 +2849,7 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
> * maps for the replacement extents (or holes).
> */
> if (extent_info && !extent_info->is_new_extent)
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> + btrfs_set_inode_full_sync(inode);
>
> if (ret)
> goto out_trans;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5aace4c13519..3783fdf78da8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -423,7 +423,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start,
> goto out;
> }
>
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> + btrfs_set_inode_full_sync(inode);
> out:
> /*
> * Don't forget to free the reserved space, as for inlined extent
> @@ -4882,8 +4882,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
> cur_offset + hole_size - 1, 0);
> hole_em = alloc_extent_map();
> if (!hole_em) {
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> - &inode->runtime_flags);
> + btrfs_set_inode_full_sync(inode);
> goto next;
> }
> hole_em->start = cur_offset;
> @@ -6146,7 +6145,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
> * sync since it will be a full sync anyway and this will blow away the
> * old info in the log.
> */
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
> + btrfs_set_inode_full_sync(BTRFS_I(inode));
>
> key[0].objectid = objectid;
> key[0].type = BTRFS_INODE_ITEM_KEY;
> @@ -8740,7 +8739,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
> * extents beyond i_size to drop.
> */
> if (control.extents_found > 0)
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
> + btrfs_set_inode_full_sync(BTRFS_I(inode));
>
> return ret;
> }
> @@ -10027,8 +10026,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>
> em = alloc_extent_map();
> if (!em) {
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> - &BTRFS_I(inode)->runtime_flags);
> + btrfs_set_inode_full_sync(BTRFS_I(inode));
> goto next;
> }
>
> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index a3930da4eb3f..e37a61ad87df 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -277,7 +277,7 @@ static int clone_copy_inline_extent(struct inode *dst,
> path->slots[0]),
> size);
> btrfs_update_inode_bytes(BTRFS_I(dst), datal, drop_args.bytes_found);
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(dst)->runtime_flags);
> + btrfs_set_inode_full_sync(BTRFS_I(dst));
> ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end);
> out:
> if (!ret && !trans) {
> @@ -575,8 +575,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
> * replaced file extent items.
> */
> if (last_dest_end >= i_size_read(inode))
> - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> - &BTRFS_I(inode)->runtime_flags);
> + btrfs_set_inode_full_sync(BTRFS_I(inode));
>
> ret = btrfs_replace_file_extents(BTRFS_I(inode), path,
> last_dest_end, destoff + len - 1, NULL, &trans);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 6bc8834ac8f7..607527a924c2 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -5836,6 +5836,14 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> if (inode_only != LOG_INODE_EXISTS)
> inode->last_log_commit = inode->last_sub_trans;
> spin_unlock(&inode->lock);
> +
> + /*
> + * Reset the last_reflink_trans so that the next fsync does not need to
> + * go through the slower path when logging extents and their checksums.
> + */
> + if (inode_only == LOG_INODE_ALL)
> + inode->last_reflink_trans = 0;
> +
> out_unlock:
> mutex_unlock(&inode->log_mutex);
> out:
> --
> 2.34.1
>
next prev parent reply other threads:[~2022-03-29 9:59 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-28 19:41 [PATCH AUTOSEL 5.17 01/21] atomics: Fix atomic64_{read_acquire,set_release} fallbacks Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 02/21] locking/lockdep: Iterate lock_classes directly when reading lockdep files Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 03/21] ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 04/21] ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 05/21] sched/tracing: Don't re-read p->state when emitting sched_switch event Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 06/21] sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 07/21] ext4: don't BUG if someone dirty pages without asking ext4 first Sasha Levin
2022-03-28 19:42 ` syzbot
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 08/21] f2fs: fix to do sanity check on curseg->alloc_type Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 09/21] NFSD: Fix nfsd_breaker_owns_lease() return values Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 10/21] f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
2022-03-29 19:57 ` Omar Sandoval
2022-03-31 16:58 ` Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Sasha Levin
2022-03-29 9:59 ` Filipe Manana [this message]
2022-03-31 16:59 ` Sasha Levin
2022-03-31 17:57 ` Filipe Manana
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 18/21] f2fs: use spin_lock to avoid hang Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 19/21] f2fs: compress: fix to print raw data size in error path of lz4 decompression Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 20/21] Adjust cifssb maximum read size Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 21/21] ntfs: add sanity check on allocation size Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YkLYhad7iX2Bv/j1@debian9.Home \
--to=fdmanana@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox