[PATCH 0/4] btrfs: enhance primary super block writeback errors

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] btrfs: enhance primary super block writeback errors
@ 2025-08-28  4:16 Qu Wenruo
  2025-08-28  4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  4:16 UTC (permalink / raw)
  To: linux-btrfs

Mark recently exposed a bug that btrfs can add zero sized devices into
an fs.

Inspired by his fix, I also exposed a situation which is pretty
concerning:

 # lsblk  /dev/loop0
 NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
 loop0   7:0    0  64K  0 loop
 # mount /dev/test/scratch1  /mnt/btrfs/
 # btrfs device add -f /dev/loop0 /mnt/btrfs/
 Performing full device TRIM /dev/loop0 (64.00KiB) ...
 # umount /mnt/btrfs
 # mount /dev/test/scratch1 /mnt/btrfs
 mount: /mnt/btrfs: fsconfig() failed: No such file or directory.
       dmesg(1) may have more information after failed mount system call.
 # dmesg -t | tail -n4
 BTRFS info (device dm-3): using crc32c (crc32c-lib) checksum algorithm
 BTRFS error (device dm-3): devid 2 uuid e449b62e-faca-4cbd-b8cb-bcfb5c549f0b is missing
 BTRFS error (device dm-3): failed to read chunk tree: -2
 BTRFS error (device dm-3): open_ctree failed: -2

That device is too small to even have the primary super block, thus
btrfs just skips it, resulting the fs unable to find the new device in
the next mount.
(Thankfully this can be fixed by mounting degraded and remove the tiny
device)

This exposed several problems related to the super block writeback
behavior:

- We should always try to writeback the primary super block
  If the device is too small, it shouldn't be added in the first place.

  And even if such device is added, we should hit an critical error
  during the first transaction on that device.

- Primary super block write failure is ignored in write_dev_supers()
  Unlike wait_dev_supers(), if we failed to submit the primary sb, but
  succeeded submitting the backup ones, we still call it a aday.

- We treat super block writeback errors too loosely
  Btrfs will not error out even if there is only one device finished the
  super block.

Enhance the error handling by:

- Treat primary sb writeback error more seriously
  In fact we don't care that much about backups, and wait_dev_supers()
  is already doing that.

  So we don't need an atomic_t to track all sb writeback errors, but
  only a single flag for the primary sb.

- Treat newly added device more seriously
  If the super block write into the newly added device failed, it will
  prevent the fs to be mounted, as btrfs can not find the device.

  So here we introduce a concept, critical device, that if sb writeback
  failed for those devices, the transaction should be aborted.

- Treat sb writeback error as if the device is missing
  And if the fs can not handle the missing device and maintain RW, we
  should flip RO.

- Revert failed new device if btrfs_init_new_device() failed
  This can be a problem of the new device itself.
  We should remove the new device if the btrfs_commit_transaction() call
  failed.


All those enhancements lead to a pretty interesting error handling
situation, where a too small device can be added to btrfs, but it will
not commit, and the original fs can still be remounted again correctly:

 # lsblk  /dev/loop0
 NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
 loop0   7:0    0  64K  0 loop
 # mount /dev/test/scratch1  /mnt/btrfs/
 # btrfs device add -f /dev/loop0 /mnt/btrfs/
 Performing full device TRIM /dev/loop0 (64.00KiB) ...
 ERROR: error adding device '/dev/loop0': Input/output error
 # umount /mnt/btrfs
 # mount /dev/test/scratch1 /mnt/btrfs
 ^^^ This will succeed, as the new device is not committed

Qu Wenruo (4):
  btrfs: enhance primary super block write error detection
  btrfs: return error if the primary super block write to a new device
    failed
  btrfs: treat super block write back error more seriously
  btrfs: add extra error handling for btrfs_init_new_device()

 fs/btrfs/disk-io.c | 93 ++++++++++++++++++++++++++++++++++------------
 fs/btrfs/volumes.c |  9 +++--
 fs/btrfs/volumes.h |  8 +---
 3 files changed, 78 insertions(+), 32 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] btrfs: enhance primary super block write error detection
  2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
@ 2025-08-28  4:16 ` Qu Wenruo
  2025-08-28  4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  4:16 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs has two functions handling super blocks writeback,
write_dev_super() and wait_dev_super().

However there are some inconsistency and missing corner cases:

- Primary super block writeback can be skipped for tiny devices
  Although such small devices shouldn't be added in the first place, we
  lack the proper checks inside btrfs_init_new_device().

  E.g. a device with only 64K bytes can be added to btrfs, and the
  primary super block writeback will be skipped, causing future mounts
  to fail, as there will be no way to find that never-to-be-written super
  block.

- Inconsistency between write_dev_super() and wait_dev_super()
  write_dev_super() will not report error as long as one super block
  is submitted.

  However wait_dev_super() will report error as long as the primary one
  failed.

Enhance the primary super block error detection by:

- Use a device state flag to indicate failed primary super block error
  So that we save an atomic_t.

- Always try to submit the primary super block
  If the device can not hold the primary super block, treat it as an
  error.

  This applies to both write_dev_super() and wait_dev_super().

- Return error if the primary super block write/wait failed
  This mostly affects write_dev_super(), as wait_dev_super() is already
  doing this behavior.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 50 ++++++++++++++++++++++++++++------------------
 fs/btrfs/volumes.h |  7 +------
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7b06bbc40898..90577fc73b3b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3690,10 +3690,7 @@ static void btrfs_end_super_write(struct bio *bio)
 						     BTRFS_DEV_STAT_WRITE_ERRS);
 			/* Ensure failure if the primary sb fails. */
 			if (bio->bi_opf & REQ_FUA)
-				atomic_add(BTRFS_SUPER_PRIMARY_WRITE_ERROR,
-					   &device->sb_write_errors);
-			else
-				atomic_inc(&device->sb_write_errors);
+				set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR, &device->dev_state);
 		}
 		folio_unlock(fi.folio);
 		folio_put(fi.folio);
@@ -3718,11 +3715,12 @@ static int write_dev_supers(struct btrfs_device *device,
 	struct btrfs_fs_info *fs_info = device->fs_info;
 	struct address_space *mapping = device->bdev->bd_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	bool primary_failed = false;
 	int i;
 	int ret;
 	u64 bytenr, bytenr_orig;
 
-	atomic_set(&device->sb_write_errors, 0);
+	clear_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR, &device->dev_state);
 
 	if (max_mirrors == 0)
 		max_mirrors = BTRFS_SUPER_MIRROR_MAX;
@@ -3738,17 +3736,26 @@ static int write_dev_supers(struct btrfs_device *device,
 		bytenr_orig = btrfs_sb_offset(i);
 		ret = btrfs_sb_log_location(device, i, WRITE, &bytenr);
 		if (ret == -ENOENT) {
+			if (i == 0)
+				primary_failed = true;
 			continue;
 		} else if (ret < 0) {
 			btrfs_err(device->fs_info,
 			  "couldn't get super block location for mirror %d error %d",
 			  i, ret);
-			atomic_inc(&device->sb_write_errors);
+			if (i == 0) {
+				set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+					&device->dev_state);
+				primary_failed = true;
+			}
 			continue;
 		}
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
-		    device->commit_total_bytes)
+		    device->commit_total_bytes) {
+			if (i == 0)
+				primary_failed = true;
 			break;
+		}
 
 		btrfs_set_super_bytenr(sb, bytenr_orig);
 
@@ -3763,7 +3770,11 @@ static int write_dev_supers(struct btrfs_device *device,
 			btrfs_err(device->fs_info,
 			  "couldn't get super block page for bytenr %llu error %ld",
 			  bytenr, PTR_ERR(folio));
-			atomic_inc(&device->sb_write_errors);
+			if (i == 0) {
+				set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+					&device->dev_state);
+				primary_failed = true;
+			}
 			continue;
 		}
 
@@ -3793,10 +3804,11 @@ static int write_dev_supers(struct btrfs_device *device,
 			bio->bi_opf |= REQ_FUA;
 		submit_bio(bio);
 
-		if (btrfs_advance_sb_log(device, i))
-			atomic_inc(&device->sb_write_errors);
+		btrfs_advance_sb_log(device, i);
 	}
-	return atomic_read(&device->sb_write_errors) < i ? 0 : -1;
+	if (primary_failed)
+		return -1;
+	return 0;
 }
 
 /*
@@ -3809,7 +3821,6 @@ static int write_dev_supers(struct btrfs_device *device,
 static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 {
 	int i;
-	int errors = 0;
 	bool primary_failed = false;
 	int ret;
 	u64 bytenr;
@@ -3824,14 +3835,16 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		if (ret == -ENOENT) {
 			break;
 		} else if (ret < 0) {
-			errors++;
 			if (i == 0)
 				primary_failed = true;
 			continue;
 		}
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
-		    device->commit_total_bytes)
+		    device->commit_total_bytes) {
+			if (i == 0)
+				primary_failed = true;
 			break;
+		}
 
 		folio = filemap_get_folio(device->bdev->bd_mapping,
 					  bytenr >> PAGE_SHIFT);
@@ -3844,16 +3857,15 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		folio_put(folio);
 	}
 
-	errors += atomic_read(&device->sb_write_errors);
-	if (errors >= BTRFS_SUPER_PRIMARY_WRITE_ERROR)
-		primary_failed = true;
+	if (!primary_failed)
+		primary_failed = test_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+					  &device->dev_state);
 	if (primary_failed) {
 		btrfs_err(device->fs_info, "error writing primary super block to device %llu",
 			  device->devid);
 		return -1;
 	}
-
-	return errors < i ? 0 : -1;
+	return 0;
 }
 
 /*
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2cbf8080eade..f69bc72680fb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -100,6 +100,7 @@ enum btrfs_raid_types {
 #define BTRFS_DEV_STATE_REPLACE_TGT	(3)
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
 #define BTRFS_DEV_STATE_NO_READA	(5)
+#define BTRFS_DEV_STATE_SB_WRITE_ERROR	(6)
 
 /* Special value encoding failure to write primary super block. */
 #define BTRFS_SUPER_PRIMARY_WRITE_ERROR		(INT_MAX / 2)
@@ -155,12 +156,6 @@ struct btrfs_device {
 	/* type and info about this device */
 	u64 type;
 
-	/*
-	 * Counter of super block write errors, values larger than
-	 * BTRFS_SUPER_PRIMARY_WRITE_ERROR encode primary super block write failure.
-	 */
-	atomic_t sb_write_errors;
-
 	/* minimal io size for this device */
 	u32 sector_size;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed
  2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
  2025-08-28  4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
@ 2025-08-28  4:16 ` Qu Wenruo
  2025-08-28  4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  4:16 UTC (permalink / raw)
  To: linux-btrfs

When adding a new device to btrfs, btrfs adds the device and relying on
the regular transaction commit to put a new super block to that device.

However during that transaction commit, if the primary super block write
back failed for that device, the fs will not be able to be mounted
unless degraded.

This is because the primary super block failed to be written to that new
device, thus btrfs is unable to locate that new device thus it will be
treated as missing.

This means for the initial transaction commit on that new device, it is
more critical than other devices.

Treat primary super block write back to a new device as a critical error,
so that if that error happened, we will abort the transaction.

Although transaction abort is annoying, it keeps the fs to be mountable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 41 +++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.c |  3 ++-
 fs/btrfs/volumes.h |  1 +
 3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 90577fc73b3b..9a8c07a0986d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3999,6 +3999,21 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }
 
+static bool is_critical_device(struct btrfs_device *dev)
+{
+	/*
+	 * New device primary super block writeback is not tolerated.
+	 *
+	 * As if a power loss after the current transaction, the new device
+	 * has no primary super block, and btrfs will refuse to mount.
+	 * Although it's still possible to mount the fs degraded since
+	 * there is no bgs on that device, it's better to error out now.
+	 */
+	if (test_bit(BTRFS_DEV_STATE_NEW, &dev->dev_state))
+		return true;
+	return false;
+}
+
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 {
 	struct list_head *head;
@@ -4009,6 +4024,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 	int do_barriers;
 	int max_errors;
 	int total_errors = 0;
+	bool critical_error = false;
 	u64 flags;
 
 	do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
@@ -4039,6 +4055,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		}
 	}
 
+	/* Start from newly added device, to detect problems of them early. */
 	list_for_each_entry(dev, head, dev_list) {
 		if (!dev->bdev) {
 			total_errors++;
@@ -4074,10 +4091,18 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		}
 
 		ret = write_dev_supers(dev, sb, max_mirrors);
-		if (ret)
+		if (ret) {
 			total_errors++;
+			if (is_critical_device(dev)) {
+				btrfs_crit(fs_info,
+					   "failed to write super blocks for device %llu",
+					   dev->devid);
+				critical_error = true;
+				break;
+			}
+		}
 	}
-	if (total_errors > max_errors) {
+	if (total_errors > max_errors || critical_error) {
 		btrfs_err(fs_info, "%d errors while writing supers",
 			  total_errors);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
@@ -4098,11 +4123,19 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 			continue;
 
 		ret = wait_dev_supers(dev, max_mirrors);
-		if (ret)
+		if (ret) {
 			total_errors++;
+			if (is_critical_device(dev)) {
+				btrfs_crit(fs_info,
+					   "failed to wait super blocks for device %llu",
+					   dev->devid);
+				critical_error = true;
+				break;
+			}
+		}
 	}
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-	if (total_errors > max_errors) {
+	if (total_errors > max_errors || critical_error) {
 		btrfs_handle_fs_error(fs_info, -EIO,
 				      "%d errors while writing supers",
 				      total_errors);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a031aafe253f..2106190e972b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2766,6 +2766,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	}
 
 	set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+	set_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
 	device->generation = trans->transid;
 	device->io_width = fs_info->sectorsize;
 	device->io_align = fs_info->sectorsize;
@@ -2865,7 +2866,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	}
 
 	ret = btrfs_commit_transaction(trans);
-
+	clear_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
 	if (seeding_dev) {
 		mutex_unlock(&uuid_mutex);
 		up_write(&sb->s_umount);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f69bc72680fb..8c571f66acff 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -101,6 +101,7 @@ enum btrfs_raid_types {
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
 #define BTRFS_DEV_STATE_NO_READA	(5)
 #define BTRFS_DEV_STATE_SB_WRITE_ERROR	(6)
+#define BTRFS_DEV_STATE_NEW		(7)
 
 /* Special value encoding failure to write primary super block. */
 #define BTRFS_SUPER_PRIMARY_WRITE_ERROR		(INT_MAX / 2)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] btrfs: treat super block write back error more seriously
  2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
  2025-08-28  4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
  2025-08-28  4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
@ 2025-08-28  4:16 ` Qu Wenruo
  2025-08-28  4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
  2025-08-28  5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  4:16 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs treats super block writebacks pretty loosely.

As long as there is at least one device writes its super block back
correctly, we will call it a day and continue.

However this is too loose, we should treat super block writeback error
more seriously, as the writeback error as the device is missing.
And if we hit a device missing, we should flips the fs RO if the fs can
no longer maintain reliable RW writes.

So here enhance the is_critical_device() handling, and if the failed
device will prevent btrfs from maintaining reliable RW operations,
treat it as a critical error, which will abort the current transaction
and flips the fs RO.

Although this brings some overhead for error handling. Now we call
btrfs_check_rw_degradable(), which depends on the profiles, it may needs
to iterate through all chunk maps to make sure the fs can still maintain
RW operations.

But that only affects error path, I'd say one should not expect full
performance when the device can not handle a critical super block
writeback.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9a8c07a0986d..aa7696a28ac2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3999,7 +3999,7 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }

-static bool is_critical_device(struct btrfs_device *dev)
+static bool is_critical_device(struct btrfs_fs_info *fs_info, struct btrfs_device *dev)
 {
 	/*
 	 * New device primary super block writeback is not tolerated.
@@ -4011,6 +4011,8 @@ static bool is_critical_device(struct btrfs_device *dev)
 	 */
 	if (test_bit(BTRFS_DEV_STATE_NEW, &dev->dev_state))
 		return true;
+	if (!btrfs_check_rw_degradable(fs_info, dev))
+		return true;
 	return false;
 }

@@ -4093,7 +4095,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		ret = write_dev_supers(dev, sb, max_mirrors);
 		if (ret) {
 			total_errors++;
-			if (is_critical_device(dev)) {
+			if (is_critical_device(fs_info, dev)) {
 				btrfs_crit(fs_info,
 					   "failed to write super blocks for device %llu",
 					   dev->devid);
@@ -4125,7 +4127,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		ret = wait_dev_supers(dev, max_mirrors);
 		if (ret) {
 			total_errors++;
-			if (is_critical_device(dev)) {
+			if (is_critical_device(fs_info, dev)) {
 				btrfs_crit(fs_info,
 					   "failed to wait super blocks for device %llu",
 					   dev->devid);
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device()
  2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-08-28  4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
@ 2025-08-28  4:16 ` Qu Wenruo
  2025-08-28  5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  4:16 UTC (permalink / raw)
  To: linux-btrfs

With all the new error handling for super block writebacks, now a tiny
device add will fail as expected:

 # lsblk  /dev/loop0
 NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
 loop0   7:0    0  64K  0 loop
 # mount /dev/test/scratch1  /mnt/btrfs/
 # btrfs device add -f /dev/loop0 /mnt/btrfs/
 Performing full device TRIM /dev/loop0 (64.00KiB) ...
 ERROR: error adding device '/dev/loop0': Input/output error

And the on-disk super block is still pointing to the older chunk tree:

 # btrfs ins dump-tree -t chunk /dev/test/scratch1
 ...
         item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
                devid 1 total_bytes 10737418240 bytes_used 562036736
                io_align 4096 io_width 4096 sector_size 4096 type 0
                generation 0 start_offset 0 dev_group 0
                seek_speed 0 bandwidth 0
                uuid 87358ef3-2cd2-4a4c-b607-4cba0cc80a3b
                fsid 26f7058c-2604-4293-a580-6efc653b849d
        item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80

But after the above error, we still can not properly mount the original
fs again:

 # umount /mnt/btrfs
 # mount /dev/test/scratch1 /mnt/btrfs
 mount: /mnt/btrfs: can't read superblock on /dev/mapper/test-scratch1.
       dmesg(1) may have more information after failed mount system call.
 # dmesg -t | tail -n 3
 BTRFS info (device dm-2): using crc32c (crc32c-lib) checksum algorithm
 BTRFS error (device dm-2): failed to read devices
 BTRFS error (device dm-2): open_ctree failed: -5

The mount failure is caused by the missing error handling of the
btrfs_commit_transaction() inside btrfs_init_new_device().

If that call failed, we should remove the new device, or it will pollute
the fs_devices, causing newer mounts to fail until the device is
forgotten manually.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2106190e972b..0f2a93824db1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2867,14 +2867,16 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	ret = btrfs_commit_transaction(trans);
 	clear_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
+	if (ret < 0) {
+		trans = NULL;
+		goto error_sysfs;
+	}
+
 	if (seeding_dev) {
 		mutex_unlock(&uuid_mutex);
 		up_write(&sb->s_umount);
 		locked = false;
 
-		if (ret) /* transaction commit */
-			return ret;
-
 		ret = btrfs_relocate_sys_chunks(fs_info);
 		if (ret < 0)
 			btrfs_handle_fs_error(fs_info, ret,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] btrfs: enhance primary super block writeback errors
  2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
                   ` (3 preceding siblings ...)
  2025-08-28  4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
@ 2025-08-28  5:20 ` Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28  5:20 UTC (permalink / raw)
  To: linux-btrfs



在 2025/8/28 13:46, Qu Wenruo 写道:
> Mark recently exposed a bug that btrfs can add zero sized devices into
> an fs.
> 
> Inspired by his fix, I also exposed a situation which is pretty
> concerning:
> 
>   # lsblk  /dev/loop0
>   NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
>   loop0   7:0    0  64K  0 loop
>   # mount /dev/test/scratch1  /mnt/btrfs/
>   # btrfs device add -f /dev/loop0 /mnt/btrfs/
>   Performing full device TRIM /dev/loop0 (64.00KiB) ...
>   # umount /mnt/btrfs
>   # mount /dev/test/scratch1 /mnt/btrfs
>   mount: /mnt/btrfs: fsconfig() failed: No such file or directory.
>         dmesg(1) may have more information after failed mount system call.
>   # dmesg -t | tail -n4
>   BTRFS info (device dm-3): using crc32c (crc32c-lib) checksum algorithm
>   BTRFS error (device dm-3): devid 2 uuid e449b62e-faca-4cbd-b8cb-bcfb5c549f0b is missing
>   BTRFS error (device dm-3): failed to read chunk tree: -2
>   BTRFS error (device dm-3): open_ctree failed: -2
> 
> That device is too small to even have the primary super block, thus
> btrfs just skips it, resulting the fs unable to find the new device in
> the next mount.
> (Thankfully this can be fixed by mounting degraded and remove the tiny
> device)
> 
> This exposed several problems related to the super block writeback
> behavior:
> 
> - We should always try to writeback the primary super block
>    If the device is too small, it shouldn't be added in the first place.
> 
>    And even if such device is added, we should hit an critical error
>    during the first transaction on that device.
> 
> - Primary super block write failure is ignored in write_dev_supers()
>    Unlike wait_dev_supers(), if we failed to submit the primary sb, but
>    succeeded submitting the backup ones, we still call it a aday.
> 
> - We treat super block writeback errors too loosely
>    Btrfs will not error out even if there is only one device finished the
>    super block.
> 
> Enhance the error handling by:
> 
> - Treat primary sb writeback error more seriously
>    In fact we don't care that much about backups, and wait_dev_supers()
>    is already doing that.
> 
>    So we don't need an atomic_t to track all sb writeback errors, but
>    only a single flag for the primary sb.
> 
> - Treat newly added device more seriously
>    If the super block write into the newly added device failed, it will
>    prevent the fs to be mounted, as btrfs can not find the device.
> 
>    So here we introduce a concept, critical device, that if sb writeback
>    failed for those devices, the transaction should be aborted.
> 
> - Treat sb writeback error as if the device is missing
>    And if the fs can not handle the missing device and maintain RW, we
>    should flip RO.

This makes btrfs more strict on dmerror related errors.

Previously we ignore those primary sb writes errors, thus it won't have 
long lasting effects.
But now we flips the fs RO, result btrfs/146 and btrfs/160 to fail, all 
due to the new RO behaviors.

Considering we're just being more strict, I'm not that concerned of 
those failures.
If we are determine to go this path, we can definitely enhance the test 
cases to handle those errors.

Thanks,
Qu

> 
> - Revert failed new device if btrfs_init_new_device() failed
>    This can be a problem of the new device itself.
>    We should remove the new device if the btrfs_commit_transaction() call
>    failed.
> 
> 
> All those enhancements lead to a pretty interesting error handling
> situation, where a too small device can be added to btrfs, but it will
> not commit, and the original fs can still be remounted again correctly:
> 
>   # lsblk  /dev/loop0
>   NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
>   loop0   7:0    0  64K  0 loop
>   # mount /dev/test/scratch1  /mnt/btrfs/
>   # btrfs device add -f /dev/loop0 /mnt/btrfs/
>   Performing full device TRIM /dev/loop0 (64.00KiB) ...
>   ERROR: error adding device '/dev/loop0': Input/output error
>   # umount /mnt/btrfs
>   # mount /dev/test/scratch1 /mnt/btrfs
>   ^^^ This will succeed, as the new device is not committed
> 
> Qu Wenruo (4):
>    btrfs: enhance primary super block write error detection
>    btrfs: return error if the primary super block write to a new device
>      failed
>    btrfs: treat super block write back error more seriously
>    btrfs: add extra error handling for btrfs_init_new_device()
> 
>   fs/btrfs/disk-io.c | 93 ++++++++++++++++++++++++++++++++++------------
>   fs/btrfs/volumes.c |  9 +++--
>   fs/btrfs/volumes.h |  8 +---
>   3 files changed, 78 insertions(+), 32 deletions(-)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-28  5:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28  4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
2025-08-28  4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
2025-08-28  4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
2025-08-28  4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
2025-08-28  4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
2025-08-28  5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).