All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: join running log transaction when logging new name
@ 2022-07-17 21:05 fdmanana
  2022-07-18 16:07 ` Josef Bacik
  2022-07-18 17:44 ` David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: fdmanana @ 2022-07-17 21:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging a new name, in case of a rename, we pin the log before
changing it. We then either delete a directory entry from the log or
insert a key range item to mark the old name for deletion on log replay.

However when doing one of those log changes we may have another task that
started writing out the log (at btrfs_sync_log()) and it started before
we pinned the log root. So we may end up changing a log tree while its
writeback is being started by another task syncing the log. This can lead
to inconsistencies in a log tree and other unexpected results during log
replay, because we can get some committed node pointing to a node/leaf
that ends up not getting written to disk before the next log commit.

The problem, conceptually, started to happen in commit 88d2beec7e53fc
("btrfs: avoid logging all directory changes during renames"), because
there we started to update the log without joining its current transaction
first.

However the problem only became visible with commit 259c4b96d78dda
("btrfs: stop doing unnecessary log updates during a rename"), and that is
because we used to pin the log at btrfs_rename() and then before entering
btrfs_log_new_name(), when unlinking the old dentry, we ended up at
btrfs_del_inode_ref_in_log() and btrfs_del_dir_entries_in_log(). Both
of them join the current log transaction, effectively waiting for any log
transaction writeout (due to acquiring the root's log_mutex). This made it
safe even after leaving the current log transaction, because we remained
with the log pinned when we called btrfs_log_new_name().

Then in commit 259c4b96d78dda ("btrfs: stop doing unnecessary log updates
during a rename"), we removed the log pinning from btrfs_rename() and
stopped calling btrfs_del_inode_ref_in_log() and
btrfs_del_dir_entries_in_log() during the rename, and started to do all
the needed work at btrfs_log_new_name(), but without joining the current
log transaction, only pinning the log, which is racy because another task
may have started writeout of the log tree right before we pinned the log.

Both commits landed in kernel 5.18, so it doesn't make any practical
difference which should be blamed, but I'm blaming the second commit only
because with the first one, by chance, the problem did not happen due to
the fact we joined the log transaction after pinning the log and unpinned
it only after calling btrfs_log_new_name().

So make btrfs_log_new_name() join the current log transaction instead of
pinning it, so that we never do log updates if it's writeout is starting.

Fixes: 259c4b96d78dda ("btrfs: stop doing unnecessary log updates during a rename")
CC: stable@vger.kernel.org # 5.18+
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1201f083d4db..9acf68ef4a49 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -7029,8 +7029,15 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 		 * anyone from syncing the log until we have updated both inodes
 		 * in the log.
 		 */
+		ret = join_running_log_trans(root);
+		/*
+		 * At least one of the inodes was logged before, so this should
+		 * not fail, but if it does, it's not serious, just bail out and
+		 * mark the log for a full commit.
+		 */
+		if (WARN_ON_ONCE(ret < 0))
+			goto out;
 		log_pinned = true;
-		btrfs_pin_log_trans(root);
 
 		path = btrfs_alloc_path();
 		if (!path) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: join running log transaction when logging new name
  2022-07-17 21:05 [PATCH] btrfs: join running log transaction when logging new name fdmanana
@ 2022-07-18 16:07 ` Josef Bacik
  2022-07-18 17:44 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Josef Bacik @ 2022-07-18 16:07 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Sun, Jul 17, 2022 at 10:05:05PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When logging a new name, in case of a rename, we pin the log before
> changing it. We then either delete a directory entry from the log or
> insert a key range item to mark the old name for deletion on log replay.
> 
> However when doing one of those log changes we may have another task that
> started writing out the log (at btrfs_sync_log()) and it started before
> we pinned the log root. So we may end up changing a log tree while its
> writeback is being started by another task syncing the log. This can lead
> to inconsistencies in a log tree and other unexpected results during log
> replay, because we can get some committed node pointing to a node/leaf
> that ends up not getting written to disk before the next log commit.
> 
> The problem, conceptually, started to happen in commit 88d2beec7e53fc
> ("btrfs: avoid logging all directory changes during renames"), because
> there we started to update the log without joining its current transaction
> first.
> 
> However the problem only became visible with commit 259c4b96d78dda
> ("btrfs: stop doing unnecessary log updates during a rename"), and that is
> because we used to pin the log at btrfs_rename() and then before entering
> btrfs_log_new_name(), when unlinking the old dentry, we ended up at
> btrfs_del_inode_ref_in_log() and btrfs_del_dir_entries_in_log(). Both
> of them join the current log transaction, effectively waiting for any log
> transaction writeout (due to acquiring the root's log_mutex). This made it
> safe even after leaving the current log transaction, because we remained
> with the log pinned when we called btrfs_log_new_name().
> 
> Then in commit 259c4b96d78dda ("btrfs: stop doing unnecessary log updates
> during a rename"), we removed the log pinning from btrfs_rename() and
> stopped calling btrfs_del_inode_ref_in_log() and
> btrfs_del_dir_entries_in_log() during the rename, and started to do all
> the needed work at btrfs_log_new_name(), but without joining the current
> log transaction, only pinning the log, which is racy because another task
> may have started writeout of the log tree right before we pinned the log.
> 
> Both commits landed in kernel 5.18, so it doesn't make any practical
> difference which should be blamed, but I'm blaming the second commit only
> because with the first one, by chance, the problem did not happen due to
> the fact we joined the log transaction after pinning the log and unpinned
> it only after calling btrfs_log_new_name().
> 
> So make btrfs_log_new_name() join the current log transaction instead of
> pinning it, so that we never do log updates if it's writeout is starting.
> 
> Fixes: 259c4b96d78dda ("btrfs: stop doing unnecessary log updates during a rename")
> CC: stable@vger.kernel.org # 5.18+
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: join running log transaction when logging new name
  2022-07-17 21:05 [PATCH] btrfs: join running log transaction when logging new name fdmanana
  2022-07-18 16:07 ` Josef Bacik
@ 2022-07-18 17:44 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2022-07-18 17:44 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Sun, Jul 17, 2022 at 10:05:05PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When logging a new name, in case of a rename, we pin the log before
> changing it. We then either delete a directory entry from the log or
> insert a key range item to mark the old name for deletion on log replay.
> 
> However when doing one of those log changes we may have another task that
> started writing out the log (at btrfs_sync_log()) and it started before
> we pinned the log root. So we may end up changing a log tree while its
> writeback is being started by another task syncing the log. This can lead
> to inconsistencies in a log tree and other unexpected results during log
> replay, because we can get some committed node pointing to a node/leaf
> that ends up not getting written to disk before the next log commit.
> 
> The problem, conceptually, started to happen in commit 88d2beec7e53fc
> ("btrfs: avoid logging all directory changes during renames"), because
> there we started to update the log without joining its current transaction
> first.
> 
> However the problem only became visible with commit 259c4b96d78dda
> ("btrfs: stop doing unnecessary log updates during a rename"), and that is
> because we used to pin the log at btrfs_rename() and then before entering
> btrfs_log_new_name(), when unlinking the old dentry, we ended up at
> btrfs_del_inode_ref_in_log() and btrfs_del_dir_entries_in_log(). Both
> of them join the current log transaction, effectively waiting for any log
> transaction writeout (due to acquiring the root's log_mutex). This made it
> safe even after leaving the current log transaction, because we remained
> with the log pinned when we called btrfs_log_new_name().
> 
> Then in commit 259c4b96d78dda ("btrfs: stop doing unnecessary log updates
> during a rename"), we removed the log pinning from btrfs_rename() and
> stopped calling btrfs_del_inode_ref_in_log() and
> btrfs_del_dir_entries_in_log() during the rename, and started to do all
> the needed work at btrfs_log_new_name(), but without joining the current
> log transaction, only pinning the log, which is racy because another task
> may have started writeout of the log tree right before we pinned the log.
> 
> Both commits landed in kernel 5.18, so it doesn't make any practical
> difference which should be blamed, but I'm blaming the second commit only
> because with the first one, by chance, the problem did not happen due to
> the fact we joined the log transaction after pinning the log and unpinned
> it only after calling btrfs_log_new_name().
> 
> So make btrfs_log_new_name() join the current log transaction instead of
> pinning it, so that we never do log updates if it's writeout is starting.
> 
> Fixes: 259c4b96d78dda ("btrfs: stop doing unnecessary log updates during a rename")
> CC: stable@vger.kernel.org # 5.18+
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-18 17:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-17 21:05 [PATCH] btrfs: join running log transaction when logging new name fdmanana
2022-07-18 16:07 ` Josef Bacik
2022-07-18 17:44 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.