[PATCH 0/2] btrfs: do not commit transaction to avoid deadly ENOSPC trap

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] btrfs: do not commit transaction to avoid deadly ENOSPC trap
@ 2023-07-27  6:07 Qu Wenruo
  2023-07-27  6:07 ` [PATCH 1/2] btrfs: do not commit transaction after adding one device Qu Wenruo
  2023-07-27  6:07 ` [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace Qu Wenruo
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-07-27  6:07 UTC (permalink / raw)
  To: linux-btrfs

There is a report that a user hits a deadly ENOSPC trap, where the fs
would immediately falls read-only when commit any transaction.

To make the case worse, the fs is using RAID1C4 (later converted to
RAID1C3), this makes it impossible to add any extra disks.

As the kernel would commit the current transaction after just adding the
first device, while RAID1C4 needs 4 new disks, this would still trigger
ENOSPC during transaction committing.

The first patch would address the problem by not committing the
transaction adding a new device.

This would cause a behavior change, would need to co-operate with
btrfs-progs to allow end users to determine if they want to sync the fs
after adding all devices.

The second patch is to address a rare corner case hit by the same
reporter, that canceling a suspended replace would also trigger the
deadly ENOSPC trap.

Qu Wenruo (2):
  btrfs: do not commit transaction after adding one device
  btrfs: do not commit transaction canceling a suspended replace

 fs/btrfs/dev-replace.c | 10 ----------
 fs/btrfs/volumes.c     | 12 ++----------
 2 files changed, 2 insertions(+), 20 deletions(-)

-- 
2.41.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] btrfs: do not commit transaction after adding one device
  2023-07-27  6:07 [PATCH 0/2] btrfs: do not commit transaction to avoid deadly ENOSPC trap Qu Wenruo
@ 2023-07-27  6:07 ` Qu Wenruo
  2023-07-27  6:07 ` [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace Qu Wenruo
  1 sibling, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-07-27  6:07 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
There is at least one report that btrfs falls to deadly ENOSPC
situation, where without adding new space btrfs would fail to commit any
transaction due to ENOSPC.

However this last resort device adding would not work if the filesystem
is using multi-device metadata profiles (RAID1*, RAID0, RAID5/6), as
after each device added, btrfs would commit the transaction, falling
back to read-only due to ENOSPC.

[CAUSE]
Of course we should not fail a transaction due to ENOSPC in the first
place, as this means at metadata reservation stage we did wrong
calculation.

But the current behavior of device add is also a little too strict,
considering device adding is the last-resort for ENOSPC, we should at
least allow end users to determine if they want to commit transaction or
not.

[FIX]
Instead of committing transaction, just end the current one.

Whether the fs would be synced would be determined at user space
(normally by btrfs-progs) instead.

Link: https://lore.kernel.org/linux-btrfs/CA+W5K0r4Lv4fPf+mWWf-ppgsjyz+wOKdBRgBR6UnQafwL7HPtg@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70d69d4b44d2..eda90d1c881f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2789,7 +2789,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		btrfs_sysfs_update_sprout_fsid(fs_devices);
 	}

-	ret = btrfs_commit_transaction(trans);
+	ret = btrfs_end_transaction(trans);
+	trans = NULL;

 	if (seeding_dev) {
 		mutex_unlock(&uuid_mutex);
@@ -2803,15 +2804,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		if (ret < 0)
 			btrfs_handle_fs_error(fs_info, ret,
 				    "Failed to relocate sys chunks after device initialization. This can be fixed using the \"btrfs balance\" command.");
-		trans = btrfs_attach_transaction(root);
-		if (IS_ERR(trans)) {
-			if (PTR_ERR(trans) == -ENOENT)
-				return 0;
-			ret = PTR_ERR(trans);
-			trans = NULL;
-			goto error_sysfs;
-		}
-		ret = btrfs_commit_transaction(trans);
 	}

 	/*
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace
  2023-07-27  6:07 [PATCH 0/2] btrfs: do not commit transaction to avoid deadly ENOSPC trap Qu Wenruo
  2023-07-27  6:07 ` [PATCH 1/2] btrfs: do not commit transaction after adding one device Qu Wenruo
@ 2023-07-27  6:07 ` Qu Wenruo
  2023-07-27 12:09   ` David Sterba
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2023-07-27  6:07 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
There is a very rare corner case that, if the filesystem falls into a
deadly ENOSPC trap (metadata is so full that committing a transaction
would trigger transaction abort and falls RO), and the user needs to
cancel a suspended dev-replace, it would fail.

This is because the dev-replace canceling itself would commit the
current transaction, and falls RO first.

[CAUSE]
There are two involved situations:

- Cancel a running dev-replace
  We just call btrfs_scrub_cancel(), it doesn't commit transaction
  anyway.

- Cancel a suspended dev-replace
  We only need to cleanup the various in-memory replace structure, which
  is no difference than the previous situation.

  But in this case we commit transaction, and may trigger the deadly
  ENOSPC trap.

[FIX]
Just follow the first case, do not commit transaction when not needed.

Link: https://lore.kernel.org/linux-btrfs/CA+W5K0pQyJH5zWxs4JxfHR06DSUWDOcDPNsKxbdKQ_CiUtpyUg@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/dev-replace.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index fff22ed55c42..35590f17a5d7 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1056,8 +1056,6 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	struct btrfs_device *tgt_device = NULL;
 	struct btrfs_device *src_device = NULL;
-	struct btrfs_trans_handle *trans;
-	struct btrfs_root *root = fs_info->tree_root;
 	int result;
 	int ret;
 
@@ -1112,14 +1110,6 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 		/* Scrub for replace must not be running in suspended state */
 		btrfs_scrub_cancel(fs_info);
 
-		trans = btrfs_start_transaction(root, 0);
-		if (IS_ERR(trans)) {
-			mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
-			return PTR_ERR(trans);
-		}
-		ret = btrfs_commit_transaction(trans);
-		WARN_ON(ret);
-
 		btrfs_info_in_rcu(fs_info,
 		"suspended dev_replace from %s (devid %llu) to %s canceled",
 			btrfs_dev_name(src_device), src_device->devid,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace
  2023-07-27  6:07 ` [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace Qu Wenruo
@ 2023-07-27 12:09   ` David Sterba
  2023-07-27 22:26     ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2023-07-27 12:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jul 27, 2023 at 02:07:54PM +0800, Qu Wenruo wrote:
> [BUG]
> There is a very rare corner case that, if the filesystem falls into a
> deadly ENOSPC trap (metadata is so full that committing a transaction
> would trigger transaction abort and falls RO), and the user needs to
> cancel a suspended dev-replace, it would fail.
> 
> This is because the dev-replace canceling itself would commit the
> current transaction, and falls RO first.
> 
> [CAUSE]
> There are two involved situations:
> 
> - Cancel a running dev-replace
>   We just call btrfs_scrub_cancel(), it doesn't commit transaction
>   anyway.
> 
> - Cancel a suspended dev-replace
>   We only need to cleanup the various in-memory replace structure, which
>   is no difference than the previous situation.

There's dev_replace->item_needs_writeback = 1, and somewhere in
transaction commit it's synced to disk.

btrfs_commit_transaction
  commit_cowonly_roots
    btrfs_run_dev_replace
      item_needs_writeback is checked

so it's not just cleaning the memory structures, it also stores the
state on disk. A delayed commit will have to write it again, which means
that the problem is only postponed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace
  2023-07-27 12:09   ` David Sterba
@ 2023-07-27 22:26     ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-07-27 22:26 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



On 2023/7/27 20:09, David Sterba wrote:
> On Thu, Jul 27, 2023 at 02:07:54PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is a very rare corner case that, if the filesystem falls into a
>> deadly ENOSPC trap (metadata is so full that committing a transaction
>> would trigger transaction abort and falls RO), and the user needs to
>> cancel a suspended dev-replace, it would fail.
>>
>> This is because the dev-replace canceling itself would commit the
>> current transaction, and falls RO first.
>>
>> [CAUSE]
>> There are two involved situations:
>>
>> - Cancel a running dev-replace
>>    We just call btrfs_scrub_cancel(), it doesn't commit transaction
>>    anyway.
>>
>> - Cancel a suspended dev-replace
>>    We only need to cleanup the various in-memory replace structure, which
>>    is no difference than the previous situation.
>
> There's dev_replace->item_needs_writeback = 1, and somewhere in
> transaction commit it's synced to disk.
>
> btrfs_commit_transaction
>    commit_cowonly_roots
>      btrfs_run_dev_replace
>        item_needs_writeback is checked
>
> so it's not just cleaning the memory structures, it also stores the
> state on disk. A delayed commit will have to write it again, which means
> that the problem is only postponed.

That may be exactly what the end users wants.

Delaying the commit so that they can add the extra device (with patch 1)
to escape the deadly ENOSPC trap.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-27 22:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-27  6:07 [PATCH 0/2] btrfs: do not commit transaction to avoid deadly ENOSPC trap Qu Wenruo
2023-07-27  6:07 ` [PATCH 1/2] btrfs: do not commit transaction after adding one device Qu Wenruo
2023-07-27  6:07 ` [PATCH 2/2] btrfs: do not commit transaction canceling a suspended replace Qu Wenruo
2023-07-27 12:09   ` David Sterba
2023-07-27 22:26     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).