* [PATCH 0/3] btrfs: introduce 3 debug sysfs interface to tweak the error handling behavior
@ 2023-09-24 6:14 Qu Wenruo
2023-09-24 6:14 ` [PATCH 1/3] btrfs: introduce allow_backup_super_failure sysfs interface Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-09-24 6:14 UTC (permalink / raw)
To: linux-btrfs
During a very interesting (and weird) debugging session, it turns out
that btrfs will ignore a lot of write errors until we hit some critical
location, then btrfs started reacting, normally by aborting the
transaction.
This can be problematic for developers
As sometimes we want to catch the earlies sign, continuing without any
obvious errors (other than kernel error messages) can make debugging
much harder.
On the other hand, I totally understand if just a single sector failed
to be write and we mark the whole fs read-only, it can be super
frustrating for regular end users, thus we can not make it the default
behavior.
So this patchset would introduce the following sysfs entries under
/sys/fs/btrfs/<uuid>/debug/:
- allow_backup_super_failure
RW, binary (0 or 1), determines if btrfs would tolerace backup super
blocks writeback failure.
If set to 1 and a failure is hit, btrfs would treat backup super
blocks writeback failure as critical (the same level as primary super
blocks).
The default value is 1, so the default behavior is not changed.
NOTE: this doesn't mean such failure would immediately lead to
trasaction abort. Check `super_failure_tolerance` for more details.
- allow_data_failure
RW, binary (0 or 1), determines if btrfs would tolerace data sectors
writeback failure.
If set to 1 and a failure is hit, btrfs would flip read-only
immediately.
The default value is 1, so the default behavior is not changed.
- super_failure_tolerance
RW, s8, determines the tolerance for devices super blocks writeback.
Btrfs allows "nr_devices - 1" devices to fail their super blocks
writeback. This means if we have 5 disks, btrfs allows 4 to fail their
super block writeback.
If the value >= 0, the value itself would be the tolerance.
If the value < 0, nr_devices + the value would be the tolerance.
If value + nr_devices is still minus, btrfs would allow all devices
to fail their super blocks writeback (aka, very dangerous)
The default value is -1, to match the existing behavior.
There would be another one for btrfs bio layer, but I have found
something weird in the code, thus it would only be introduced after I
solved the problem there, meanwhile we can discuss on the usefulness of
this patchset.
Qu Wenruo (3):
btrfs: introduce allow_backup_super_failure sysfs interface
btrfs: introduce super_failure_tolerance sysfs interface
btrfs: introduce allow_data_failure sysfs interface
fs/btrfs/disk-io.c | 35 +++++++++++++---
fs/btrfs/extent_io.c | 8 +++-
fs/btrfs/fs.h | 23 ++++++++++
fs/btrfs/inode.c | 9 +++-
fs/btrfs/sysfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 167 insertions(+), 7 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] btrfs: introduce allow_backup_super_failure sysfs interface
2023-09-24 6:14 [PATCH 0/3] btrfs: introduce 3 debug sysfs interface to tweak the error handling behavior Qu Wenruo
@ 2023-09-24 6:14 ` Qu Wenruo
2023-09-24 6:14 ` [PATCH 2/3] btrfs: introduce super_failure_tolerance " Qu Wenruo
2023-09-24 6:14 ` [PATCH 3/3] btrfs: introduce allow_data_failure " Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-09-24 6:14 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs allows the backup super block to fail its writeback, as
long as the primary one is still fine.
This tolerance may be a little too loose for some debug purposes, thus
this patch would introduce the following sysfs interface:
/sys/fs/btrfs/<uuid>/debug/allow_backup_super_failure
Which is a read-write entry, its content is 0/1, indicating if we allow
backup super blocks to fail its writeback.
The default value is 1, meaning we allow backup super blocks to fail its
writeback.
Writing anything but 0 would set the value to 1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 7 +++++--
fs/btrfs/fs.h | 3 +++
fs/btrfs/sysfs.c | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dc577b3c53f6..d8eb968e9e5e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2722,6 +2722,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
INIT_LIST_HEAD(&fs_info->allocated_roots);
INIT_LIST_HEAD(&fs_info->allocated_ebs);
spin_lock_init(&fs_info->eb_leak_lock);
+ fs_info->allow_backup_super_failure = true;
#endif
extent_map_tree_init(&fs_info->mapping_tree);
btrfs_init_block_rsv(&fs_info->global_block_rsv,
@@ -3841,8 +3842,10 @@ static int write_dev_supers(struct btrfs_device *device,
*/
static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
{
+ struct btrfs_fs_info *fs_info = device->fs_info;
int i;
int errors = 0;
+ bool allow_super_failure = READ_ONCE(fs_info->allow_backup_super_failure);
bool primary_failed = false;
int ret;
u64 bytenr;
@@ -3890,8 +3893,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
}
/* log error, force error return */
- if (primary_failed) {
- btrfs_err(device->fs_info, "error writing primary super block to device %llu",
+ if (primary_failed || (!allow_super_failure && errors)) {
+ btrfs_err(device->fs_info, "error writing super block to device %llu",
device->devid);
return -1;
}
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 19f9a444bcd8..2dff41cb463d 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -685,6 +685,9 @@ struct btrfs_fs_info {
struct btrfs_work qgroup_rescan_work;
/* Protected by qgroup_rescan_lock */
bool qgroup_rescan_running;
+
+ /* If we allow backup superblocks writeback to fail. */
+ bool allow_backup_super_failure;
u8 qgroup_drop_subtree_thres;
/*
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 8b75e974f30b..852090622a76 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -614,12 +614,49 @@ static const struct attribute *discard_attrs[] = {
#ifdef CONFIG_BTRFS_DEBUG
+static ssize_t allow_backup_super_failure_show(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ char *buf)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+
+ ASSERT(fs_info);
+ return sysfs_emit(buf, "%d\n",
+ READ_ONCE(fs_info->allow_backup_super_failure));
+}
+
+static ssize_t allow_backup_super_failure_store(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+ u8 new_number;
+ int ret;
+
+ ASSERT(fs_info);
+
+ ret = kstrtos8(buf, 10, &new_number);
+ if (ret)
+ return -EINVAL;
+ WRITE_ONCE(fs_info->allow_backup_super_failure, !!new_number);
+ return len;
+}
+BTRFS_ATTR_RW(debug, allow_backup_super_failure, allow_backup_super_failure_show,
+ allow_backup_super_failure_store);
+
/*
* Per-filesystem runtime debugging exported via sysfs.
*
* Path: /sys/fs/btrfs/UUID/debug/
+ *
+ * - allow_backup_super_failure
+ * RW, binary (0/1), determins if we allow backup superblock writeback to fail.
+ *
+ * NOTE: Even with this set to 1, btrfs may still allow some errors to
+ * happen as btrfs can tolerate up to "rw_devs - 1" failures.
*/
static const struct attribute *btrfs_debug_mount_attrs[] = {
+ BTRFS_ATTR_PTR(debug, allow_backup_super_failure),
NULL,
};
--
2.42.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] btrfs: introduce super_failure_tolerance sysfs interface
2023-09-24 6:14 [PATCH 0/3] btrfs: introduce 3 debug sysfs interface to tweak the error handling behavior Qu Wenruo
2023-09-24 6:14 ` [PATCH 1/3] btrfs: introduce allow_backup_super_failure sysfs interface Qu Wenruo
@ 2023-09-24 6:14 ` Qu Wenruo
2023-09-24 6:14 ` [PATCH 3/3] btrfs: introduce allow_data_failure " Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-09-24 6:14 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs has a questionable tolerance on how many devices can
fail their super blocks writeback, it allows "num_devices - 1" to
fail.
This can already be problematic for multi-device btrfses, but
unfortunately I don't have anything better for now.
Instead this patch would allow debug builds to configure the tolerance
by the new sysfs interface:
/sys/fs/btrfs/<uuid>/debug/super_failure_tolerance
This value is s8, for values >= 0 it's the tolerance number directly.
E.g. if the value is 0, we do not allow any device to fail its super
block writeback.
If the value is 2, and the fs only have 2 devices, it means we allow all
devices to fail their super block writeback (aka, very dangerous).
If the value is minus, then the tolerance is num_devices plus this
value.
E.g. if the value is -1 (default), and we have 2 devices, it means the
tolerance is 1 (at most one device can fail).
If the value is -2, and we have 1 devices, this means we allow all
devices to fail (again, very dangerous).
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 27 ++++++++++++++++++++++++---
fs/btrfs/fs.h | 18 ++++++++++++++++++
fs/btrfs/sysfs.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 72 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d8eb968e9e5e..062e28ac94b1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2723,6 +2723,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
INIT_LIST_HEAD(&fs_info->allocated_ebs);
spin_lock_init(&fs_info->eb_leak_lock);
fs_info->allow_backup_super_failure = true;
+ fs_info->super_failure_tolerance = -1;
#endif
extent_map_tree_init(&fs_info->mapping_tree);
btrfs_init_block_rsv(&fs_info->global_block_rsv,
@@ -4033,6 +4034,26 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
return min_tolerated;
}
+static int calculate_max_super_errors(struct btrfs_fs_info *fs_info)
+{
+ int num_devs = btrfs_super_num_devices(fs_info->super_copy);
+ int tolerance_value = READ_ONCE(fs_info->super_failure_tolerance);
+
+ if (tolerance_value >= 0)
+ return tolerance_value;
+
+ ASSERT(num_devs >= 0);
+
+ /*
+ * Now tolerance_value is minus, check if
+ * abs(@tolerance_value) is > @num_devices. If so we allow all devices
+ * to fail.
+ */
+ if (-tolerance_value >= num_devs)
+ return INT_MAX;
+ return num_devs + tolerance_value;
+}
+
int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
{
struct list_head *head;
@@ -4060,7 +4081,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
mutex_lock(&fs_info->fs_devices->device_list_mutex);
head = &fs_info->fs_devices->devices;
- max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
+ max_errors = calculate_max_super_errors(fs_info);
if (do_barriers) {
ret = barrier_all_devices(fs_info);
@@ -4138,8 +4159,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
if (total_errors > max_errors) {
btrfs_handle_fs_error(fs_info, -EIO,
- "%d errors while writing supers",
- total_errors);
+ "failed to write supers: errors %d tolerance %d",
+ total_errors, max_errors);
return -EIO;
}
return 0;
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 2dff41cb463d..7608a1cf612f 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -688,6 +688,24 @@ struct btrfs_fs_info {
/* If we allow backup superblocks writeback to fail. */
bool allow_backup_super_failure;
+
+ /*
+ * Tolerance on how many devices can fail their superblock writeback.
+ *
+ * If the value >= 0, then the value itself is the tolerance.
+ * If the value < 0, then it would be (rw_devices - value) as the tolerance.
+ *
+ * Default value is -1.
+ *
+ * E.g. 0 means we do not accept any device to fail its super blocks writeback.
+ *
+ * If there are 3 devices and the value is -1, then it means we allow up to 2
+ * devices to fail its super blocks writeback.
+ *
+ * If there are 3 devices and the value is -3 or -4, we would allow all devices
+ * to fail their super blocks writeback, which can be very DANGEROUS!
+ */
+ s8 super_failure_tolerance;
u8 qgroup_drop_subtree_thres;
/*
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 852090622a76..bd9f574c2471 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -644,6 +644,35 @@ static ssize_t allow_backup_super_failure_store(struct kobject *debug_kobj,
BTRFS_ATTR_RW(debug, allow_backup_super_failure, allow_backup_super_failure_show,
allow_backup_super_failure_store);
+static ssize_t super_failure_tolerance_show(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ char *buf)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+
+ ASSERT(fs_info);
+ return sysfs_emit(buf, "%d\n",
+ READ_ONCE(fs_info->super_failure_tolerance));
+}
+
+static ssize_t super_failure_tolerance_store(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+ u8 new_number;
+ int ret;
+
+ ASSERT(fs_info);
+
+ ret = kstrtos8(buf, 10, &new_number);
+ if (ret)
+ return -EINVAL;
+ WRITE_ONCE(fs_info->super_failure_tolerance, new_number);
+ return len;
+}
+BTRFS_ATTR_RW(debug, super_failure_tolerance, super_failure_tolerance_show,
+ super_failure_tolerance_store);
/*
* Per-filesystem runtime debugging exported via sysfs.
*
@@ -657,6 +686,7 @@ BTRFS_ATTR_RW(debug, allow_backup_super_failure, allow_backup_super_failure_show
*/
static const struct attribute *btrfs_debug_mount_attrs[] = {
BTRFS_ATTR_PTR(debug, allow_backup_super_failure),
+ BTRFS_ATTR_PTR(debug, super_failure_tolerance),
NULL,
};
--
2.42.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] btrfs: introduce allow_data_failure sysfs interface
2023-09-24 6:14 [PATCH 0/3] btrfs: introduce 3 debug sysfs interface to tweak the error handling behavior Qu Wenruo
2023-09-24 6:14 ` [PATCH 1/3] btrfs: introduce allow_backup_super_failure sysfs interface Qu Wenruo
2023-09-24 6:14 ` [PATCH 2/3] btrfs: introduce super_failure_tolerance " Qu Wenruo
@ 2023-09-24 6:14 ` Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-09-24 6:14 UTC (permalink / raw)
To: linux-btrfs
Currently if btrfs fails to write data blocks, it will not really cause
any great damage, but mostly -EIO for involved writeback functions like
fsync() or direct io for that inode.
Normally it's not a big deal, but it can be an indicator of a bigger
problem (e.g. unreliable hardware).
Thus this patch would allow debug builds to toggle if any data writeback
failure is allowed"
/sys/fs/btrfs/<uuid>/debug/allow_data_failure
The entry is read-write, 0 means the fs would not tolerate any data
writeback failure, and would falls read-only after such failure.
The default value is 1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 1 +
fs/btrfs/extent_io.c | 8 +++++++-
fs/btrfs/fs.h | 2 ++
fs/btrfs/inode.c | 9 ++++++++-
fs/btrfs/sysfs.c | 32 ++++++++++++++++++++++++++++++++
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 062e28ac94b1..160f8f6b906d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2723,6 +2723,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
INIT_LIST_HEAD(&fs_info->allocated_ebs);
spin_lock_init(&fs_info->eb_leak_lock);
fs_info->allow_backup_super_failure = true;
+ fs_info->allow_data_failure = true;
fs_info->super_failure_tolerance = -1;
#endif
extent_map_tree_init(&fs_info->mapping_tree);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5e5852a4ffb5..95725c5027de 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -483,8 +483,14 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
bvec->bv_offset, bvec->bv_len);
btrfs_finish_ordered_extent(bbio->ordered, page, start, len, !error);
- if (error)
+ if (error) {
mapping_set_error(page->mapping, error);
+ if (!READ_ONCE(fs_info->allow_data_failure))
+ btrfs_handle_fs_error(fs_info, -EIO,
+ "data write back failed, root %lld ino %llu fileoff %llu",
+ BTRFS_I(inode)->root->root_key.objectid,
+ btrfs_ino(BTRFS_I(inode)), start);
+ }
btrfs_page_clear_writeback(fs_info, page, start, len);
}
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 7608a1cf612f..fa26ae33a29d 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -689,6 +689,8 @@ struct btrfs_fs_info {
/* If we allow backup superblocks writeback to fail. */
bool allow_backup_super_failure;
+ /* If we allow data writeback to fail. */
+ bool allow_data_failure;
/*
* Tolerance on how many devices can fail their superblock writeback.
*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 514d2e8a4f52..4388eeced1bf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7703,13 +7703,20 @@ static void btrfs_dio_end_io(struct btrfs_bio *bbio)
struct btrfs_dio_private *dip =
container_of(bbio, struct btrfs_dio_private, bbio);
struct btrfs_inode *inode = bbio->inode;
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct bio *bio = &bbio->bio;
if (bio->bi_status) {
- btrfs_warn(inode->root->fs_info,
+ btrfs_warn(fs_info,
"direct IO failed ino %llu op 0x%0x offset %#llx len %u err no %d",
btrfs_ino(inode), bio->bi_opf,
dip->file_offset, dip->bytes, bio->bi_status);
+ if (!READ_ONCE(fs_info->allow_data_failure))
+ btrfs_handle_fs_error(fs_info, -EIO,
+ "direct IO data write back failed, root %lld ino %llu fileoff %llu len %u",
+ inode->root->root_key.objectid,
+ btrfs_ino(inode), dip->file_offset,
+ dip->bytes);
}
if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index bd9f574c2471..a32a7b2d1b7a 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -673,6 +673,37 @@ static ssize_t super_failure_tolerance_store(struct kobject *debug_kobj,
}
BTRFS_ATTR_RW(debug, super_failure_tolerance, super_failure_tolerance_show,
super_failure_tolerance_store);
+
+static ssize_t allow_data_failure_show(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ char *buf)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+
+ ASSERT(fs_info);
+ return sysfs_emit(buf, "%d\n",
+ READ_ONCE(fs_info->allow_data_failure));
+}
+
+static ssize_t allow_data_failure_store(struct kobject *debug_kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ struct btrfs_fs_info *fs_info = to_fs_info(debug_kobj->parent);
+ u8 new_number;
+ int ret;
+
+ ASSERT(fs_info);
+
+ ret = kstrtos8(buf, 10, &new_number);
+ if (ret)
+ return -EINVAL;
+ WRITE_ONCE(fs_info->allow_data_failure, !!new_number);
+ return len;
+}
+BTRFS_ATTR_RW(debug, allow_data_failure, allow_data_failure_show,
+ allow_data_failure_store);
+
/*
* Per-filesystem runtime debugging exported via sysfs.
*
@@ -686,6 +717,7 @@ BTRFS_ATTR_RW(debug, super_failure_tolerance, super_failure_tolerance_show,
*/
static const struct attribute *btrfs_debug_mount_attrs[] = {
BTRFS_ATTR_PTR(debug, allow_backup_super_failure),
+ BTRFS_ATTR_PTR(debug, allow_data_failure),
BTRFS_ATTR_PTR(debug, super_failure_tolerance),
NULL,
};
--
2.42.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-09-24 6:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-24 6:14 [PATCH 0/3] btrfs: introduce 3 debug sysfs interface to tweak the error handling behavior Qu Wenruo
2023-09-24 6:14 ` [PATCH 1/3] btrfs: introduce allow_backup_super_failure sysfs interface Qu Wenruo
2023-09-24 6:14 ` [PATCH 2/3] btrfs: introduce super_failure_tolerance " Qu Wenruo
2023-09-24 6:14 ` [PATCH 3/3] btrfs: introduce allow_data_failure " Qu Wenruo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.