[PATCH] btrfs-progs: scrub: add extra warning for scrub on single device

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device
@ 2022-08-22  7:15 Qu Wenruo
  2022-08-22  8:00 ` Graham Cobb
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2022-08-22  7:15 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]

Command "btrfs scrub start" can be executed with single device or a
mount point.

If executed with single device, it will only start a scrub for that
device.

This is fine for SINGLE/DUP/RAID1*/RAID10/RAID0, as they are all
mirror/simple stripe based profiles.

But this can be a problem for RAID56, as for RAID56 scrub, data stripes
are treated as RAID0, while only when scrubbing the P/Q stripes or doing
data rebuild we will try to do a full stripe read/rebuild.

This means, for the following case:

Dev 1:  |<--- Data stripe 1 (good) -->|
Dev 2:  |<--- Data stripe 2 (good) -->|
Dev 3:  |<--- Parity stripe (bad)  -->|

If we only scrub dev 1 or dev 2, the corrupted parity on dev 3 will not
be repaired, breaking the one corrupted/missing device tolerance.
(if we lost device 1 or 2, no data can be recovered for this full
stripe).

Unfortunately for the existing btrfs RAID56 users, there seems to be a
trend to use single device scrub instead of full fs scrub.

[CAUSE]

This is due to the bad design of btrfs_scrub_dev(), by treating data
stripes just like RAID0.
This means, we will read data stripes several times (2x for RAID5, 3x
for RAID6), not only causing more IO, but much slower read for each
device.

At least the end users should be fully informed, and choose their poison
to drink (until a better ioctl introduced).

[WORKAROUND]

This patch will introduce the following workarounds:

- Extra warning for "btrfs scrub start <dev>"
  If we detect we're doing single device scrub, and the fs has RAID56
  feature, we output a warning for the user.
  (But still allow the command to execute)

- Extra man page warning.
  Now there is an extra section for scrub and RAID56.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
BTW there is a WIP new ioctl, btrfs_scrub_fs(), that is going to change
the situation for RAID56, by not only reducing the IO for RAID56, but
with better error reporting (including P/Q mismatch cases), and simpler
and streamlined code for verification.

Thus in the future, we may switch to the new ioctl and in that case, the
error message will no longer be needed for the new ioctl.
---
 Documentation/btrfs-scrub.rst | 13 +++++++++++++
 cmds/scrub.c                  | 29 +++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/Documentation/btrfs-scrub.rst b/Documentation/btrfs-scrub.rst
index 75079eecc9ef..bbf342d4b6c8 100644
--- a/Documentation/btrfs-scrub.rst
+++ b/Documentation/btrfs-scrub.rst
@@ -48,6 +48,19 @@ start [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>
         configured similar to the ``ionice(1)`` syntax using *-c* and *-n*
         options.  Note that not all IO schedulers honor the ionice settings.
 
+        .. warning::
+                Using ``btrfs scrub start <device>`` on a RAID56 fs is strongly
+                discouraged.
+
+                Due to the bad design of RAID56 scrub, single device scrubbing
+                will tread the data stripes as RAID0, only for data recovery
+                (data stripes has bad csum) or P/Q stripes we do full stripe
+                checks.
+
+                This means, if running ``btrfs scrub start <device>``,
+                corruption in P/Q stripes has a high chance not to be repaired,
+                greatly reducing the robustness of RAID56.
+
         ``Options``
 
         -B
diff --git a/cmds/scrub.c b/cmds/scrub.c
index 7c2d9b79c275..a69de8c1402b 100644
--- a/cmds/scrub.c
+++ b/cmds/scrub.c
@@ -42,6 +42,7 @@
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/disk-io.h"
 #include "common/open-utils.h"
+#include "common/path-utils.h"
 #include "common/units.h"
 #include "cmds/commands.h"
 #include "common/help.h"
@@ -1143,6 +1144,7 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 		       bool resume)
 {
 	int fdmnt;
+	int sysfsfd = -1;
 	int prg_fd = -1;
 	int fdres = -1;
 	int ret;
@@ -1188,6 +1190,8 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 	DIR *dirstream = NULL;
 	int force = 0;
 	int nothing_to_resume = 0;
+	bool is_block_dev = false;
+	bool is_raid56 = false;
 
 	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
 		switch (c) {
@@ -1242,6 +1246,9 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 
 	path = argv[optind];
 
+	if (path_is_block_device(path))
+		is_block_dev = true;
+
 	fdmnt = open_path_or_dev_mnt(path, &dirstream, !do_quiet);
 	if (fdmnt < 0)
 		return 1;
@@ -1254,6 +1261,28 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 		err = 1;
 		goto out;
 	}
+
+	sysfsfd = sysfs_open_fsid_file(fdmnt, "features/raid56");
+	if (sysfsfd >= 0) {
+		is_raid56 = true;
+		close(sysfsfd);
+	}
+
+	/*
+	 * If we're scrubbing a single device on fs with RAID56, that scrub
+	 * will not verify the data on the other stripes unless we're scrubbing
+	 * a P/Q stripe.
+	 * Due to the current scrub_dev ioctl design, we can cause way more
+	 * IO for data stripes, thus full scrub on RAID56 can be slow.
+	 *
+	 * Some one uses single device scrub to speed up the progress, but the
+	 * hidden cost of corrupted/unscrubbed data must not be hidden.
+	 */
+	if (is_raid56 && is_block_dev) {
+		warning("scrub single device of a btrfs RAID56 is not recommened.");
+		warning("Check 'btrfs-scrub'(8) for more details");
+	}
+
 	if (!fi_args.num_devices) {
 		error_on(!do_quiet, "no devices found");
 		err = 1;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device
  2022-08-22  7:15 [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device Qu Wenruo
@ 2022-08-22  8:00 ` Graham Cobb
  2022-08-22  8:07   ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Graham Cobb @ 2022-08-22  8:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 22/08/2022 08:15, Qu Wenruo wrote:
> [PROBLEM]
> 
> Command "btrfs scrub start" can be executed with single device or a
> mount point.
> 
> If executed with single device, it will only start a scrub for that
> device.
> 
> This is fine for SINGLE/DUP/RAID1*/RAID10/RAID0, as they are all
> mirror/simple stripe based profiles.
> 
> But this can be a problem for RAID56, as for RAID56 scrub, data stripes
> are treated as RAID0, while only when scrubbing the P/Q stripes or doing
> data rebuild we will try to do a full stripe read/rebuild.
> 
> This means, for the following case:
> 
> Dev 1:  |<--- Data stripe 1 (good) -->|
> Dev 2:  |<--- Data stripe 2 (good) -->|
> Dev 3:  |<--- Parity stripe (bad)  -->|
> 
> If we only scrub dev 1 or dev 2, the corrupted parity on dev 3 will not
> be repaired, breaking the one corrupted/missing device tolerance.
> (if we lost device 1 or 2, no data can be recovered for this full
> stripe).
> 
> Unfortunately for the existing btrfs RAID56 users, there seems to be a
> trend to use single device scrub instead of full fs scrub.
> 
> [CAUSE]
> 
> This is due to the bad design of btrfs_scrub_dev(), by treating data
> stripes just like RAID0.
> This means, we will read data stripes several times (2x for RAID5, 3x
> for RAID6), not only causing more IO, but much slower read for each
> device.
> 
> At least the end users should be fully informed, and choose their poison
> to drink (until a better ioctl introduced).
> 
> [WORKAROUND]
> 
> This patch will introduce the following workarounds:
> 
> - Extra warning for "btrfs scrub start <dev>"
>   If we detect we're doing single device scrub, and the fs has RAID56
>   feature, we output a warning for the user.
>   (But still allow the command to execute)
> 
> - Extra man page warning.
>   Now there is an extra section for scrub and RAID56.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> BTW there is a WIP new ioctl, btrfs_scrub_fs(), that is going to change
> the situation for RAID56, by not only reducing the IO for RAID56, but
> with better error reporting (including P/Q mismatch cases), and simpler
> and streamlined code for verification.
> 
> Thus in the future, we may switch to the new ioctl and in that case, the
> error message will no longer be needed for the new ioctl.
> ---
>  Documentation/btrfs-scrub.rst | 13 +++++++++++++
>  cmds/scrub.c                  | 29 +++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/btrfs-scrub.rst b/Documentation/btrfs-scrub.rst
> index 75079eecc9ef..bbf342d4b6c8 100644
> --- a/Documentation/btrfs-scrub.rst
> +++ b/Documentation/btrfs-scrub.rst
> @@ -48,6 +48,19 @@ start [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>
>          configured similar to the ``ionice(1)`` syntax using *-c* and *-n*
>          options.  Note that not all IO schedulers honor the ionice settings.
>  
> +        .. warning::
> +                Using ``btrfs scrub start <device>`` on a RAID56 fs is strongly
> +                discouraged.

It isn't immediately obvious to the reader that you mean to distinguish
the single-device case from the mounted-filesystem case here. How about:

Using single device scrub on a RAID56 filesystem (``btrfs scrub start
<device>``) is strongly discouraged.

> +
> +                Due to the bad design of RAID56 scrub, single device scrubbing
> +                will tread the data stripes as RAID0, only for data recovery
> +                (data stripes has bad csum) or P/Q stripes we do full stripe
> +                checks.
> +
> +                This means, if running ``btrfs scrub start <device>``,
> +                corruption in P/Q stripes has a high chance not to be repaired,
> +                greatly reducing the robustness of RAID56.

How about replacing both the above paragraphs with:

The current implementation of single device scrubbing for RAID56
filesystems will only check data checksums on the specified device, and
will not check parity stripes on other devices, greatly reducing the
robustness of RAID56.

For full scrub and repair of RAID56 filesystems always use ``btrfs scrub
start <path>``.

> +
>          ``Options``
>  
>          -B
> diff --git a/cmds/scrub.c b/cmds/scrub.c
> index 7c2d9b79c275..a69de8c1402b 100644
> --- a/cmds/scrub.c
> +++ b/cmds/scrub.c
> @@ -42,6 +42,7 @@
>  #include "kernel-shared/volumes.h"
>  #include "kernel-shared/disk-io.h"
>  #include "common/open-utils.h"
> +#include "common/path-utils.h"
>  #include "common/units.h"
>  #include "cmds/commands.h"
>  #include "common/help.h"
> @@ -1143,6 +1144,7 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>  		       bool resume)
>  {
>  	int fdmnt;
> +	int sysfsfd = -1;
>  	int prg_fd = -1;
>  	int fdres = -1;
>  	int ret;
> @@ -1188,6 +1190,8 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>  	DIR *dirstream = NULL;
>  	int force = 0;
>  	int nothing_to_resume = 0;
> +	bool is_block_dev = false;
> +	bool is_raid56 = false;
>  
>  	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
>  		switch (c) {
> @@ -1242,6 +1246,9 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>  
>  	path = argv[optind];
>  
> +	if (path_is_block_device(path))
> +		is_block_dev = true;
> +
>  	fdmnt = open_path_or_dev_mnt(path, &dirstream, !do_quiet);
>  	if (fdmnt < 0)
>  		return 1;
> @@ -1254,6 +1261,28 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>  		err = 1;
>  		goto out;
>  	}
> +
> +	sysfsfd = sysfs_open_fsid_file(fdmnt, "features/raid56");
> +	if (sysfsfd >= 0) {
> +		is_raid56 = true;
> +		close(sysfsfd);
> +	}
> +
> +	/*
> +	 * If we're scrubbing a single device on fs with RAID56, that scrub
> +	 * will not verify the data on the other stripes unless we're scrubbing
> +	 * a P/Q stripe.
> +	 * Due to the current scrub_dev ioctl design, we can cause way more
> +	 * IO for data stripes, thus full scrub on RAID56 can be slow.
> +	 *
> +	 * Some one uses single device scrub to speed up the progress, but the
> +	 * hidden cost of corrupted/unscrubbed data must not be hidden.
> +	 */
> +	if (is_raid56 && is_block_dev) {
> +		warning("scrub single device of a btrfs RAID56 is not recommened.");

Recommended

> +		warning("Check 'btrfs-scrub'(8) for more details");
> +	}
> +
>  	if (!fi_args.num_devices) {
>  		error_on(!do_quiet, "no devices found");
>  		err = 1;

Graham

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device
  2022-08-22  8:00 ` Graham Cobb
@ 2022-08-22  8:07   ` Qu Wenruo
  0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-08-22  8:07 UTC (permalink / raw)
  To: Graham Cobb, linux-btrfs



On 2022/8/22 16:00, Graham Cobb wrote:
> On 22/08/2022 08:15, Qu Wenruo wrote:
>> [PROBLEM]
>>
>> Command "btrfs scrub start" can be executed with single device or a
>> mount point.
>>
>> If executed with single device, it will only start a scrub for that
>> device.
>>
>> This is fine for SINGLE/DUP/RAID1*/RAID10/RAID0, as they are all
>> mirror/simple stripe based profiles.
>>
>> But this can be a problem for RAID56, as for RAID56 scrub, data stripes
>> are treated as RAID0, while only when scrubbing the P/Q stripes or doing
>> data rebuild we will try to do a full stripe read/rebuild.
>>
>> This means, for the following case:
>>
>> Dev 1:  |<--- Data stripe 1 (good) -->|
>> Dev 2:  |<--- Data stripe 2 (good) -->|
>> Dev 3:  |<--- Parity stripe (bad)  -->|
>>
>> If we only scrub dev 1 or dev 2, the corrupted parity on dev 3 will not
>> be repaired, breaking the one corrupted/missing device tolerance.
>> (if we lost device 1 or 2, no data can be recovered for this full
>> stripe).
>>
>> Unfortunately for the existing btrfs RAID56 users, there seems to be a
>> trend to use single device scrub instead of full fs scrub.
>>
>> [CAUSE]
>>
>> This is due to the bad design of btrfs_scrub_dev(), by treating data
>> stripes just like RAID0.
>> This means, we will read data stripes several times (2x for RAID5, 3x
>> for RAID6), not only causing more IO, but much slower read for each
>> device.
>>
>> At least the end users should be fully informed, and choose their poison
>> to drink (until a better ioctl introduced).
>>
>> [WORKAROUND]
>>
>> This patch will introduce the following workarounds:
>>
>> - Extra warning for "btrfs scrub start <dev>"
>>    If we detect we're doing single device scrub, and the fs has RAID56
>>    feature, we output a warning for the user.
>>    (But still allow the command to execute)
>>
>> - Extra man page warning.
>>    Now there is an extra section for scrub and RAID56.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> BTW there is a WIP new ioctl, btrfs_scrub_fs(), that is going to change
>> the situation for RAID56, by not only reducing the IO for RAID56, but
>> with better error reporting (including P/Q mismatch cases), and simpler
>> and streamlined code for verification.
>>
>> Thus in the future, we may switch to the new ioctl and in that case, the
>> error message will no longer be needed for the new ioctl.
>> ---
>>   Documentation/btrfs-scrub.rst | 13 +++++++++++++
>>   cmds/scrub.c                  | 29 +++++++++++++++++++++++++++++
>>   2 files changed, 42 insertions(+)
>>
>> diff --git a/Documentation/btrfs-scrub.rst b/Documentation/btrfs-scrub.rst
>> index 75079eecc9ef..bbf342d4b6c8 100644
>> --- a/Documentation/btrfs-scrub.rst
>> +++ b/Documentation/btrfs-scrub.rst
>> @@ -48,6 +48,19 @@ start [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>
>>           configured similar to the ``ionice(1)`` syntax using *-c* and *-n*
>>           options.  Note that not all IO schedulers honor the ionice settings.
>>
>> +        .. warning::
>> +                Using ``btrfs scrub start <device>`` on a RAID56 fs is strongly
>> +                discouraged.
>
> It isn't immediately obvious to the reader that you mean to distinguish
> the single-device case from the mounted-filesystem case here. How about:

Well, I intentionally to avoid "single-device" mention until we mention
"btrfs scrub start <device>" first.

As I'm not confident enough for every reader to understand "single
device scrub".


>
> Using single device scrub on a RAID56 filesystem (``btrfs scrub start
> <device>``) is strongly discouraged.
>
>> +
>> +                Due to the bad design of RAID56 scrub, single device scrubbing
>> +                will tread the data stripes as RAID0, only for data recovery
>> +                (data stripes has bad csum) or P/Q stripes we do full stripe
>> +                checks.
>> +
>> +                This means, if running ``btrfs scrub start <device>``,
>> +                corruption in P/Q stripes has a high chance not to be repaired,
>> +                greatly reducing the robustness of RAID56.
>
> How about replacing both the above paragraphs with:
>
> The current implementation of single device scrubbing for RAID56
> filesystems will only check data checksums on the specified device, and
> will not check parity stripes on other devices, greatly reducing the
> robustness of RAID56.

The problem is, this ignores the tech details that, single device scrub
on RAID56 can still check the full stripe, if a) we're scrubbing the P/Q
stripes on that device or b), the data stripes on that device has
mismatched csum.

I still want to the end users to have some correct view of the RAID56,
knowing what single device scrub is doing, and what it doesn't do.

Above example would unnecessarily give a very bad impression on the
whole scrub idea.

Thanks,
Qu

>
> For full scrub and repair of RAID56 filesystems always use ``btrfs scrub
> start <path>``.
>
>> +
>>           ``Options``
>>
>>           -B
>> diff --git a/cmds/scrub.c b/cmds/scrub.c
>> index 7c2d9b79c275..a69de8c1402b 100644
>> --- a/cmds/scrub.c
>> +++ b/cmds/scrub.c
>> @@ -42,6 +42,7 @@
>>   #include "kernel-shared/volumes.h"
>>   #include "kernel-shared/disk-io.h"
>>   #include "common/open-utils.h"
>> +#include "common/path-utils.h"
>>   #include "common/units.h"
>>   #include "cmds/commands.h"
>>   #include "common/help.h"
>> @@ -1143,6 +1144,7 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   		       bool resume)
>>   {
>>   	int fdmnt;
>> +	int sysfsfd = -1;
>>   	int prg_fd = -1;
>>   	int fdres = -1;
>>   	int ret;
>> @@ -1188,6 +1190,8 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   	DIR *dirstream = NULL;
>>   	int force = 0;
>>   	int nothing_to_resume = 0;
>> +	bool is_block_dev = false;
>> +	bool is_raid56 = false;
>>
>>   	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
>>   		switch (c) {
>> @@ -1242,6 +1246,9 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>
>>   	path = argv[optind];
>>
>> +	if (path_is_block_device(path))
>> +		is_block_dev = true;
>> +
>>   	fdmnt = open_path_or_dev_mnt(path, &dirstream, !do_quiet);
>>   	if (fdmnt < 0)
>>   		return 1;
>> @@ -1254,6 +1261,28 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   		err = 1;
>>   		goto out;
>>   	}
>> +
>> +	sysfsfd = sysfs_open_fsid_file(fdmnt, "features/raid56");
>> +	if (sysfsfd >= 0) {
>> +		is_raid56 = true;
>> +		close(sysfsfd);
>> +	}
>> +
>> +	/*
>> +	 * If we're scrubbing a single device on fs with RAID56, that scrub
>> +	 * will not verify the data on the other stripes unless we're scrubbing
>> +	 * a P/Q stripe.
>> +	 * Due to the current scrub_dev ioctl design, we can cause way more
>> +	 * IO for data stripes, thus full scrub on RAID56 can be slow.
>> +	 *
>> +	 * Some one uses single device scrub to speed up the progress, but the
>> +	 * hidden cost of corrupted/unscrubbed data must not be hidden.
>> +	 */
>> +	if (is_raid56 && is_block_dev) {
>> +		warning("scrub single device of a btrfs RAID56 is not recommened.");
>
> Recommended
>
>> +		warning("Check 'btrfs-scrub'(8) for more details");
>> +	}
>> +
>>   	if (!fi_args.num_devices) {
>>   		error_on(!do_quiet, "no devices found");
>>   		err = 1;
>
> Graham

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-22  8:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-22  7:15 [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device Qu Wenruo
2022-08-22  8:00 ` Graham Cobb
2022-08-22  8:07   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox