* [PATCH 0/4] btrfs: enhance primary super block writeback errors
@ 2025-08-28 4:16 Qu Wenruo
2025-08-28 4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 4:16 UTC (permalink / raw)
To: linux-btrfs
Mark recently exposed a bug that btrfs can add zero sized devices into
an fs.
Inspired by his fix, I also exposed a situation which is pretty
concerning:
# lsblk /dev/loop0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 64K 0 loop
# mount /dev/test/scratch1 /mnt/btrfs/
# btrfs device add -f /dev/loop0 /mnt/btrfs/
Performing full device TRIM /dev/loop0 (64.00KiB) ...
# umount /mnt/btrfs
# mount /dev/test/scratch1 /mnt/btrfs
mount: /mnt/btrfs: fsconfig() failed: No such file or directory.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n4
BTRFS info (device dm-3): using crc32c (crc32c-lib) checksum algorithm
BTRFS error (device dm-3): devid 2 uuid e449b62e-faca-4cbd-b8cb-bcfb5c549f0b is missing
BTRFS error (device dm-3): failed to read chunk tree: -2
BTRFS error (device dm-3): open_ctree failed: -2
That device is too small to even have the primary super block, thus
btrfs just skips it, resulting the fs unable to find the new device in
the next mount.
(Thankfully this can be fixed by mounting degraded and remove the tiny
device)
This exposed several problems related to the super block writeback
behavior:
- We should always try to writeback the primary super block
If the device is too small, it shouldn't be added in the first place.
And even if such device is added, we should hit an critical error
during the first transaction on that device.
- Primary super block write failure is ignored in write_dev_supers()
Unlike wait_dev_supers(), if we failed to submit the primary sb, but
succeeded submitting the backup ones, we still call it a aday.
- We treat super block writeback errors too loosely
Btrfs will not error out even if there is only one device finished the
super block.
Enhance the error handling by:
- Treat primary sb writeback error more seriously
In fact we don't care that much about backups, and wait_dev_supers()
is already doing that.
So we don't need an atomic_t to track all sb writeback errors, but
only a single flag for the primary sb.
- Treat newly added device more seriously
If the super block write into the newly added device failed, it will
prevent the fs to be mounted, as btrfs can not find the device.
So here we introduce a concept, critical device, that if sb writeback
failed for those devices, the transaction should be aborted.
- Treat sb writeback error as if the device is missing
And if the fs can not handle the missing device and maintain RW, we
should flip RO.
- Revert failed new device if btrfs_init_new_device() failed
This can be a problem of the new device itself.
We should remove the new device if the btrfs_commit_transaction() call
failed.
All those enhancements lead to a pretty interesting error handling
situation, where a too small device can be added to btrfs, but it will
not commit, and the original fs can still be remounted again correctly:
# lsblk /dev/loop0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 64K 0 loop
# mount /dev/test/scratch1 /mnt/btrfs/
# btrfs device add -f /dev/loop0 /mnt/btrfs/
Performing full device TRIM /dev/loop0 (64.00KiB) ...
ERROR: error adding device '/dev/loop0': Input/output error
# umount /mnt/btrfs
# mount /dev/test/scratch1 /mnt/btrfs
^^^ This will succeed, as the new device is not committed
Qu Wenruo (4):
btrfs: enhance primary super block write error detection
btrfs: return error if the primary super block write to a new device
failed
btrfs: treat super block write back error more seriously
btrfs: add extra error handling for btrfs_init_new_device()
fs/btrfs/disk-io.c | 93 ++++++++++++++++++++++++++++++++++------------
fs/btrfs/volumes.c | 9 +++--
fs/btrfs/volumes.h | 8 +---
3 files changed, 78 insertions(+), 32 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/4] btrfs: enhance primary super block write error detection
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
@ 2025-08-28 4:16 ` Qu Wenruo
2025-08-28 4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 4:16 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs has two functions handling super blocks writeback,
write_dev_super() and wait_dev_super().
However there are some inconsistency and missing corner cases:
- Primary super block writeback can be skipped for tiny devices
Although such small devices shouldn't be added in the first place, we
lack the proper checks inside btrfs_init_new_device().
E.g. a device with only 64K bytes can be added to btrfs, and the
primary super block writeback will be skipped, causing future mounts
to fail, as there will be no way to find that never-to-be-written super
block.
- Inconsistency between write_dev_super() and wait_dev_super()
write_dev_super() will not report error as long as one super block
is submitted.
However wait_dev_super() will report error as long as the primary one
failed.
Enhance the primary super block error detection by:
- Use a device state flag to indicate failed primary super block error
So that we save an atomic_t.
- Always try to submit the primary super block
If the device can not hold the primary super block, treat it as an
error.
This applies to both write_dev_super() and wait_dev_super().
- Return error if the primary super block write/wait failed
This mostly affects write_dev_super(), as wait_dev_super() is already
doing this behavior.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 50 ++++++++++++++++++++++++++++------------------
fs/btrfs/volumes.h | 7 +------
2 files changed, 32 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7b06bbc40898..90577fc73b3b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3690,10 +3690,7 @@ static void btrfs_end_super_write(struct bio *bio)
BTRFS_DEV_STAT_WRITE_ERRS);
/* Ensure failure if the primary sb fails. */
if (bio->bi_opf & REQ_FUA)
- atomic_add(BTRFS_SUPER_PRIMARY_WRITE_ERROR,
- &device->sb_write_errors);
- else
- atomic_inc(&device->sb_write_errors);
+ set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR, &device->dev_state);
}
folio_unlock(fi.folio);
folio_put(fi.folio);
@@ -3718,11 +3715,12 @@ static int write_dev_supers(struct btrfs_device *device,
struct btrfs_fs_info *fs_info = device->fs_info;
struct address_space *mapping = device->bdev->bd_mapping;
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+ bool primary_failed = false;
int i;
int ret;
u64 bytenr, bytenr_orig;
- atomic_set(&device->sb_write_errors, 0);
+ clear_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR, &device->dev_state);
if (max_mirrors == 0)
max_mirrors = BTRFS_SUPER_MIRROR_MAX;
@@ -3738,17 +3736,26 @@ static int write_dev_supers(struct btrfs_device *device,
bytenr_orig = btrfs_sb_offset(i);
ret = btrfs_sb_log_location(device, i, WRITE, &bytenr);
if (ret == -ENOENT) {
+ if (i == 0)
+ primary_failed = true;
continue;
} else if (ret < 0) {
btrfs_err(device->fs_info,
"couldn't get super block location for mirror %d error %d",
i, ret);
- atomic_inc(&device->sb_write_errors);
+ if (i == 0) {
+ set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+ &device->dev_state);
+ primary_failed = true;
+ }
continue;
}
if (bytenr + BTRFS_SUPER_INFO_SIZE >=
- device->commit_total_bytes)
+ device->commit_total_bytes) {
+ if (i == 0)
+ primary_failed = true;
break;
+ }
btrfs_set_super_bytenr(sb, bytenr_orig);
@@ -3763,7 +3770,11 @@ static int write_dev_supers(struct btrfs_device *device,
btrfs_err(device->fs_info,
"couldn't get super block page for bytenr %llu error %ld",
bytenr, PTR_ERR(folio));
- atomic_inc(&device->sb_write_errors);
+ if (i == 0) {
+ set_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+ &device->dev_state);
+ primary_failed = true;
+ }
continue;
}
@@ -3793,10 +3804,11 @@ static int write_dev_supers(struct btrfs_device *device,
bio->bi_opf |= REQ_FUA;
submit_bio(bio);
- if (btrfs_advance_sb_log(device, i))
- atomic_inc(&device->sb_write_errors);
+ btrfs_advance_sb_log(device, i);
}
- return atomic_read(&device->sb_write_errors) < i ? 0 : -1;
+ if (primary_failed)
+ return -1;
+ return 0;
}
/*
@@ -3809,7 +3821,6 @@ static int write_dev_supers(struct btrfs_device *device,
static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
{
int i;
- int errors = 0;
bool primary_failed = false;
int ret;
u64 bytenr;
@@ -3824,14 +3835,16 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
if (ret == -ENOENT) {
break;
} else if (ret < 0) {
- errors++;
if (i == 0)
primary_failed = true;
continue;
}
if (bytenr + BTRFS_SUPER_INFO_SIZE >=
- device->commit_total_bytes)
+ device->commit_total_bytes) {
+ if (i == 0)
+ primary_failed = true;
break;
+ }
folio = filemap_get_folio(device->bdev->bd_mapping,
bytenr >> PAGE_SHIFT);
@@ -3844,16 +3857,15 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
folio_put(folio);
}
- errors += atomic_read(&device->sb_write_errors);
- if (errors >= BTRFS_SUPER_PRIMARY_WRITE_ERROR)
- primary_failed = true;
+ if (!primary_failed)
+ primary_failed = test_bit(BTRFS_DEV_STATE_SB_WRITE_ERROR,
+ &device->dev_state);
if (primary_failed) {
btrfs_err(device->fs_info, "error writing primary super block to device %llu",
device->devid);
return -1;
}
-
- return errors < i ? 0 : -1;
+ return 0;
}
/*
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2cbf8080eade..f69bc72680fb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -100,6 +100,7 @@ enum btrfs_raid_types {
#define BTRFS_DEV_STATE_REPLACE_TGT (3)
#define BTRFS_DEV_STATE_FLUSH_SENT (4)
#define BTRFS_DEV_STATE_NO_READA (5)
+#define BTRFS_DEV_STATE_SB_WRITE_ERROR (6)
/* Special value encoding failure to write primary super block. */
#define BTRFS_SUPER_PRIMARY_WRITE_ERROR (INT_MAX / 2)
@@ -155,12 +156,6 @@ struct btrfs_device {
/* type and info about this device */
u64 type;
- /*
- * Counter of super block write errors, values larger than
- * BTRFS_SUPER_PRIMARY_WRITE_ERROR encode primary super block write failure.
- */
- atomic_t sb_write_errors;
-
/* minimal io size for this device */
u32 sector_size;
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
2025-08-28 4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
@ 2025-08-28 4:16 ` Qu Wenruo
2025-08-28 4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 4:16 UTC (permalink / raw)
To: linux-btrfs
When adding a new device to btrfs, btrfs adds the device and relying on
the regular transaction commit to put a new super block to that device.
However during that transaction commit, if the primary super block write
back failed for that device, the fs will not be able to be mounted
unless degraded.
This is because the primary super block failed to be written to that new
device, thus btrfs is unable to locate that new device thus it will be
treated as missing.
This means for the initial transaction commit on that new device, it is
more critical than other devices.
Treat primary super block write back to a new device as a critical error,
so that if that error happened, we will abort the transaction.
Although transaction abort is annoying, it keeps the fs to be mountable.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 41 +++++++++++++++++++++++++++++++++++++----
fs/btrfs/volumes.c | 3 ++-
fs/btrfs/volumes.h | 1 +
3 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 90577fc73b3b..9a8c07a0986d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3999,6 +3999,21 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
return min_tolerated;
}
+static bool is_critical_device(struct btrfs_device *dev)
+{
+ /*
+ * New device primary super block writeback is not tolerated.
+ *
+ * As if a power loss after the current transaction, the new device
+ * has no primary super block, and btrfs will refuse to mount.
+ * Although it's still possible to mount the fs degraded since
+ * there is no bgs on that device, it's better to error out now.
+ */
+ if (test_bit(BTRFS_DEV_STATE_NEW, &dev->dev_state))
+ return true;
+ return false;
+}
+
int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
{
struct list_head *head;
@@ -4009,6 +4024,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
int do_barriers;
int max_errors;
int total_errors = 0;
+ bool critical_error = false;
u64 flags;
do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
@@ -4039,6 +4055,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
}
}
+ /* Start from newly added device, to detect problems of them early. */
list_for_each_entry(dev, head, dev_list) {
if (!dev->bdev) {
total_errors++;
@@ -4074,10 +4091,18 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
}
ret = write_dev_supers(dev, sb, max_mirrors);
- if (ret)
+ if (ret) {
total_errors++;
+ if (is_critical_device(dev)) {
+ btrfs_crit(fs_info,
+ "failed to write super blocks for device %llu",
+ dev->devid);
+ critical_error = true;
+ break;
+ }
+ }
}
- if (total_errors > max_errors) {
+ if (total_errors > max_errors || critical_error) {
btrfs_err(fs_info, "%d errors while writing supers",
total_errors);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
@@ -4098,11 +4123,19 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
continue;
ret = wait_dev_supers(dev, max_mirrors);
- if (ret)
+ if (ret) {
total_errors++;
+ if (is_critical_device(dev)) {
+ btrfs_crit(fs_info,
+ "failed to wait super blocks for device %llu",
+ dev->devid);
+ critical_error = true;
+ break;
+ }
+ }
}
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- if (total_errors > max_errors) {
+ if (total_errors > max_errors || critical_error) {
btrfs_handle_fs_error(fs_info, -EIO,
"%d errors while writing supers",
total_errors);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a031aafe253f..2106190e972b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2766,6 +2766,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
}
set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ set_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
device->generation = trans->transid;
device->io_width = fs_info->sectorsize;
device->io_align = fs_info->sectorsize;
@@ -2865,7 +2866,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
}
ret = btrfs_commit_transaction(trans);
-
+ clear_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
if (seeding_dev) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f69bc72680fb..8c571f66acff 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -101,6 +101,7 @@ enum btrfs_raid_types {
#define BTRFS_DEV_STATE_FLUSH_SENT (4)
#define BTRFS_DEV_STATE_NO_READA (5)
#define BTRFS_DEV_STATE_SB_WRITE_ERROR (6)
+#define BTRFS_DEV_STATE_NEW (7)
/* Special value encoding failure to write primary super block. */
#define BTRFS_SUPER_PRIMARY_WRITE_ERROR (INT_MAX / 2)
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/4] btrfs: treat super block write back error more seriously
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
2025-08-28 4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
2025-08-28 4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
@ 2025-08-28 4:16 ` Qu Wenruo
2025-08-28 4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
2025-08-28 5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 4:16 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs treats super block writebacks pretty loosely.
As long as there is at least one device writes its super block back
correctly, we will call it a day and continue.
However this is too loose, we should treat super block writeback error
more seriously, as the writeback error as the device is missing.
And if we hit a device missing, we should flips the fs RO if the fs can
no longer maintain reliable RW writes.
So here enhance the is_critical_device() handling, and if the failed
device will prevent btrfs from maintaining reliable RW operations,
treat it as a critical error, which will abort the current transaction
and flips the fs RO.
Although this brings some overhead for error handling. Now we call
btrfs_check_rw_degradable(), which depends on the profiles, it may needs
to iterate through all chunk maps to make sure the fs can still maintain
RW operations.
But that only affects error path, I'd say one should not expect full
performance when the device can not handle a critical super block
writeback.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9a8c07a0986d..aa7696a28ac2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3999,7 +3999,7 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
return min_tolerated;
}
-static bool is_critical_device(struct btrfs_device *dev)
+static bool is_critical_device(struct btrfs_fs_info *fs_info, struct btrfs_device *dev)
{
/*
* New device primary super block writeback is not tolerated.
@@ -4011,6 +4011,8 @@ static bool is_critical_device(struct btrfs_device *dev)
*/
if (test_bit(BTRFS_DEV_STATE_NEW, &dev->dev_state))
return true;
+ if (!btrfs_check_rw_degradable(fs_info, dev))
+ return true;
return false;
}
@@ -4093,7 +4095,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
ret = write_dev_supers(dev, sb, max_mirrors);
if (ret) {
total_errors++;
- if (is_critical_device(dev)) {
+ if (is_critical_device(fs_info, dev)) {
btrfs_crit(fs_info,
"failed to write super blocks for device %llu",
dev->devid);
@@ -4125,7 +4127,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
ret = wait_dev_supers(dev, max_mirrors);
if (ret) {
total_errors++;
- if (is_critical_device(dev)) {
+ if (is_critical_device(fs_info, dev)) {
btrfs_crit(fs_info,
"failed to wait super blocks for device %llu",
dev->devid);
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device()
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
` (2 preceding siblings ...)
2025-08-28 4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
@ 2025-08-28 4:16 ` Qu Wenruo
2025-08-28 5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 4:16 UTC (permalink / raw)
To: linux-btrfs
With all the new error handling for super block writebacks, now a tiny
device add will fail as expected:
# lsblk /dev/loop0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 64K 0 loop
# mount /dev/test/scratch1 /mnt/btrfs/
# btrfs device add -f /dev/loop0 /mnt/btrfs/
Performing full device TRIM /dev/loop0 (64.00KiB) ...
ERROR: error adding device '/dev/loop0': Input/output error
And the on-disk super block is still pointing to the older chunk tree:
# btrfs ins dump-tree -t chunk /dev/test/scratch1
...
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
devid 1 total_bytes 10737418240 bytes_used 562036736
io_align 4096 io_width 4096 sector_size 4096 type 0
generation 0 start_offset 0 dev_group 0
seek_speed 0 bandwidth 0
uuid 87358ef3-2cd2-4a4c-b607-4cba0cc80a3b
fsid 26f7058c-2604-4293-a580-6efc653b849d
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80
But after the above error, we still can not properly mount the original
fs again:
# umount /mnt/btrfs
# mount /dev/test/scratch1 /mnt/btrfs
mount: /mnt/btrfs: can't read superblock on /dev/mapper/test-scratch1.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n 3
BTRFS info (device dm-2): using crc32c (crc32c-lib) checksum algorithm
BTRFS error (device dm-2): failed to read devices
BTRFS error (device dm-2): open_ctree failed: -5
The mount failure is caused by the missing error handling of the
btrfs_commit_transaction() inside btrfs_init_new_device().
If that call failed, we should remove the new device, or it will pollute
the fs_devices, causing newer mounts to fail until the device is
forgotten manually.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2106190e972b..0f2a93824db1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2867,14 +2867,16 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
ret = btrfs_commit_transaction(trans);
clear_bit(BTRFS_DEV_STATE_NEW, &device->dev_state);
+ if (ret < 0) {
+ trans = NULL;
+ goto error_sysfs;
+ }
+
if (seeding_dev) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
locked = false;
- if (ret) /* transaction commit */
- return ret;
-
ret = btrfs_relocate_sys_chunks(fs_info);
if (ret < 0)
btrfs_handle_fs_error(fs_info, ret,
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/4] btrfs: enhance primary super block writeback errors
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
` (3 preceding siblings ...)
2025-08-28 4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
@ 2025-08-28 5:20 ` Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-28 5:20 UTC (permalink / raw)
To: linux-btrfs
在 2025/8/28 13:46, Qu Wenruo 写道:
> Mark recently exposed a bug that btrfs can add zero sized devices into
> an fs.
>
> Inspired by his fix, I also exposed a situation which is pretty
> concerning:
>
> # lsblk /dev/loop0
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> loop0 7:0 0 64K 0 loop
> # mount /dev/test/scratch1 /mnt/btrfs/
> # btrfs device add -f /dev/loop0 /mnt/btrfs/
> Performing full device TRIM /dev/loop0 (64.00KiB) ...
> # umount /mnt/btrfs
> # mount /dev/test/scratch1 /mnt/btrfs
> mount: /mnt/btrfs: fsconfig() failed: No such file or directory.
> dmesg(1) may have more information after failed mount system call.
> # dmesg -t | tail -n4
> BTRFS info (device dm-3): using crc32c (crc32c-lib) checksum algorithm
> BTRFS error (device dm-3): devid 2 uuid e449b62e-faca-4cbd-b8cb-bcfb5c549f0b is missing
> BTRFS error (device dm-3): failed to read chunk tree: -2
> BTRFS error (device dm-3): open_ctree failed: -2
>
> That device is too small to even have the primary super block, thus
> btrfs just skips it, resulting the fs unable to find the new device in
> the next mount.
> (Thankfully this can be fixed by mounting degraded and remove the tiny
> device)
>
> This exposed several problems related to the super block writeback
> behavior:
>
> - We should always try to writeback the primary super block
> If the device is too small, it shouldn't be added in the first place.
>
> And even if such device is added, we should hit an critical error
> during the first transaction on that device.
>
> - Primary super block write failure is ignored in write_dev_supers()
> Unlike wait_dev_supers(), if we failed to submit the primary sb, but
> succeeded submitting the backup ones, we still call it a aday.
>
> - We treat super block writeback errors too loosely
> Btrfs will not error out even if there is only one device finished the
> super block.
>
> Enhance the error handling by:
>
> - Treat primary sb writeback error more seriously
> In fact we don't care that much about backups, and wait_dev_supers()
> is already doing that.
>
> So we don't need an atomic_t to track all sb writeback errors, but
> only a single flag for the primary sb.
>
> - Treat newly added device more seriously
> If the super block write into the newly added device failed, it will
> prevent the fs to be mounted, as btrfs can not find the device.
>
> So here we introduce a concept, critical device, that if sb writeback
> failed for those devices, the transaction should be aborted.
>
> - Treat sb writeback error as if the device is missing
> And if the fs can not handle the missing device and maintain RW, we
> should flip RO.
This makes btrfs more strict on dmerror related errors.
Previously we ignore those primary sb writes errors, thus it won't have
long lasting effects.
But now we flips the fs RO, result btrfs/146 and btrfs/160 to fail, all
due to the new RO behaviors.
Considering we're just being more strict, I'm not that concerned of
those failures.
If we are determine to go this path, we can definitely enhance the test
cases to handle those errors.
Thanks,
Qu
>
> - Revert failed new device if btrfs_init_new_device() failed
> This can be a problem of the new device itself.
> We should remove the new device if the btrfs_commit_transaction() call
> failed.
>
>
> All those enhancements lead to a pretty interesting error handling
> situation, where a too small device can be added to btrfs, but it will
> not commit, and the original fs can still be remounted again correctly:
>
> # lsblk /dev/loop0
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> loop0 7:0 0 64K 0 loop
> # mount /dev/test/scratch1 /mnt/btrfs/
> # btrfs device add -f /dev/loop0 /mnt/btrfs/
> Performing full device TRIM /dev/loop0 (64.00KiB) ...
> ERROR: error adding device '/dev/loop0': Input/output error
> # umount /mnt/btrfs
> # mount /dev/test/scratch1 /mnt/btrfs
> ^^^ This will succeed, as the new device is not committed
>
> Qu Wenruo (4):
> btrfs: enhance primary super block write error detection
> btrfs: return error if the primary super block write to a new device
> failed
> btrfs: treat super block write back error more seriously
> btrfs: add extra error handling for btrfs_init_new_device()
>
> fs/btrfs/disk-io.c | 93 ++++++++++++++++++++++++++++++++++------------
> fs/btrfs/volumes.c | 9 +++--
> fs/btrfs/volumes.h | 8 +---
> 3 files changed, 78 insertions(+), 32 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-28 5:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 4:16 [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
2025-08-28 4:16 ` [PATCH 1/4] btrfs: enhance primary super block write error detection Qu Wenruo
2025-08-28 4:16 ` [PATCH 2/4] btrfs: return error if the primary super block write to a new device failed Qu Wenruo
2025-08-28 4:16 ` [PATCH 3/4] btrfs: treat super block write back error more seriously Qu Wenruo
2025-08-28 4:16 ` [PATCH 4/4] btrfs: add extra error handling for btrfs_init_new_device() Qu Wenruo
2025-08-28 5:20 ` [PATCH 0/4] btrfs: enhance primary super block writeback errors Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).