Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Graham Cobb <g.btrfs@cobb.uk.net>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device
Date: Mon, 22 Aug 2022 16:07:54 +0800	[thread overview]
Message-ID: <ecfb1ac1-1b1d-e2da-82b8-fd6b0627acd4@gmx.com> (raw)
In-Reply-To: <8c703e22-3425-0e7a-23d3-41793d7ed6e7@cobb.uk.net>



On 2022/8/22 16:00, Graham Cobb wrote:
> On 22/08/2022 08:15, Qu Wenruo wrote:
>> [PROBLEM]
>>
>> Command "btrfs scrub start" can be executed with single device or a
>> mount point.
>>
>> If executed with single device, it will only start a scrub for that
>> device.
>>
>> This is fine for SINGLE/DUP/RAID1*/RAID10/RAID0, as they are all
>> mirror/simple stripe based profiles.
>>
>> But this can be a problem for RAID56, as for RAID56 scrub, data stripes
>> are treated as RAID0, while only when scrubbing the P/Q stripes or doing
>> data rebuild we will try to do a full stripe read/rebuild.
>>
>> This means, for the following case:
>>
>> Dev 1:  |<--- Data stripe 1 (good) -->|
>> Dev 2:  |<--- Data stripe 2 (good) -->|
>> Dev 3:  |<--- Parity stripe (bad)  -->|
>>
>> If we only scrub dev 1 or dev 2, the corrupted parity on dev 3 will not
>> be repaired, breaking the one corrupted/missing device tolerance.
>> (if we lost device 1 or 2, no data can be recovered for this full
>> stripe).
>>
>> Unfortunately for the existing btrfs RAID56 users, there seems to be a
>> trend to use single device scrub instead of full fs scrub.
>>
>> [CAUSE]
>>
>> This is due to the bad design of btrfs_scrub_dev(), by treating data
>> stripes just like RAID0.
>> This means, we will read data stripes several times (2x for RAID5, 3x
>> for RAID6), not only causing more IO, but much slower read for each
>> device.
>>
>> At least the end users should be fully informed, and choose their poison
>> to drink (until a better ioctl introduced).
>>
>> [WORKAROUND]
>>
>> This patch will introduce the following workarounds:
>>
>> - Extra warning for "btrfs scrub start <dev>"
>>    If we detect we're doing single device scrub, and the fs has RAID56
>>    feature, we output a warning for the user.
>>    (But still allow the command to execute)
>>
>> - Extra man page warning.
>>    Now there is an extra section for scrub and RAID56.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> BTW there is a WIP new ioctl, btrfs_scrub_fs(), that is going to change
>> the situation for RAID56, by not only reducing the IO for RAID56, but
>> with better error reporting (including P/Q mismatch cases), and simpler
>> and streamlined code for verification.
>>
>> Thus in the future, we may switch to the new ioctl and in that case, the
>> error message will no longer be needed for the new ioctl.
>> ---
>>   Documentation/btrfs-scrub.rst | 13 +++++++++++++
>>   cmds/scrub.c                  | 29 +++++++++++++++++++++++++++++
>>   2 files changed, 42 insertions(+)
>>
>> diff --git a/Documentation/btrfs-scrub.rst b/Documentation/btrfs-scrub.rst
>> index 75079eecc9ef..bbf342d4b6c8 100644
>> --- a/Documentation/btrfs-scrub.rst
>> +++ b/Documentation/btrfs-scrub.rst
>> @@ -48,6 +48,19 @@ start [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>
>>           configured similar to the ``ionice(1)`` syntax using *-c* and *-n*
>>           options.  Note that not all IO schedulers honor the ionice settings.
>>
>> +        .. warning::
>> +                Using ``btrfs scrub start <device>`` on a RAID56 fs is strongly
>> +                discouraged.
>
> It isn't immediately obvious to the reader that you mean to distinguish
> the single-device case from the mounted-filesystem case here. How about:

Well, I intentionally to avoid "single-device" mention until we mention
"btrfs scrub start <device>" first.

As I'm not confident enough for every reader to understand "single
device scrub".


>
> Using single device scrub on a RAID56 filesystem (``btrfs scrub start
> <device>``) is strongly discouraged.
>
>> +
>> +                Due to the bad design of RAID56 scrub, single device scrubbing
>> +                will tread the data stripes as RAID0, only for data recovery
>> +                (data stripes has bad csum) or P/Q stripes we do full stripe
>> +                checks.
>> +
>> +                This means, if running ``btrfs scrub start <device>``,
>> +                corruption in P/Q stripes has a high chance not to be repaired,
>> +                greatly reducing the robustness of RAID56.
>
> How about replacing both the above paragraphs with:
>
> The current implementation of single device scrubbing for RAID56
> filesystems will only check data checksums on the specified device, and
> will not check parity stripes on other devices, greatly reducing the
> robustness of RAID56.

The problem is, this ignores the tech details that, single device scrub
on RAID56 can still check the full stripe, if a) we're scrubbing the P/Q
stripes on that device or b), the data stripes on that device has
mismatched csum.

I still want to the end users to have some correct view of the RAID56,
knowing what single device scrub is doing, and what it doesn't do.

Above example would unnecessarily give a very bad impression on the
whole scrub idea.

Thanks,
Qu

>
> For full scrub and repair of RAID56 filesystems always use ``btrfs scrub
> start <path>``.
>
>> +
>>           ``Options``
>>
>>           -B
>> diff --git a/cmds/scrub.c b/cmds/scrub.c
>> index 7c2d9b79c275..a69de8c1402b 100644
>> --- a/cmds/scrub.c
>> +++ b/cmds/scrub.c
>> @@ -42,6 +42,7 @@
>>   #include "kernel-shared/volumes.h"
>>   #include "kernel-shared/disk-io.h"
>>   #include "common/open-utils.h"
>> +#include "common/path-utils.h"
>>   #include "common/units.h"
>>   #include "cmds/commands.h"
>>   #include "common/help.h"
>> @@ -1143,6 +1144,7 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   		       bool resume)
>>   {
>>   	int fdmnt;
>> +	int sysfsfd = -1;
>>   	int prg_fd = -1;
>>   	int fdres = -1;
>>   	int ret;
>> @@ -1188,6 +1190,8 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   	DIR *dirstream = NULL;
>>   	int force = 0;
>>   	int nothing_to_resume = 0;
>> +	bool is_block_dev = false;
>> +	bool is_raid56 = false;
>>
>>   	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
>>   		switch (c) {
>> @@ -1242,6 +1246,9 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>
>>   	path = argv[optind];
>>
>> +	if (path_is_block_device(path))
>> +		is_block_dev = true;
>> +
>>   	fdmnt = open_path_or_dev_mnt(path, &dirstream, !do_quiet);
>>   	if (fdmnt < 0)
>>   		return 1;
>> @@ -1254,6 +1261,28 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
>>   		err = 1;
>>   		goto out;
>>   	}
>> +
>> +	sysfsfd = sysfs_open_fsid_file(fdmnt, "features/raid56");
>> +	if (sysfsfd >= 0) {
>> +		is_raid56 = true;
>> +		close(sysfsfd);
>> +	}
>> +
>> +	/*
>> +	 * If we're scrubbing a single device on fs with RAID56, that scrub
>> +	 * will not verify the data on the other stripes unless we're scrubbing
>> +	 * a P/Q stripe.
>> +	 * Due to the current scrub_dev ioctl design, we can cause way more
>> +	 * IO for data stripes, thus full scrub on RAID56 can be slow.
>> +	 *
>> +	 * Some one uses single device scrub to speed up the progress, but the
>> +	 * hidden cost of corrupted/unscrubbed data must not be hidden.
>> +	 */
>> +	if (is_raid56 && is_block_dev) {
>> +		warning("scrub single device of a btrfs RAID56 is not recommened.");
>
> Recommended
>
>> +		warning("Check 'btrfs-scrub'(8) for more details");
>> +	}
>> +
>>   	if (!fi_args.num_devices) {
>>   		error_on(!do_quiet, "no devices found");
>>   		err = 1;
>
> Graham

      reply	other threads:[~2022-08-22  8:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22  7:15 [PATCH] btrfs-progs: scrub: add extra warning for scrub on single device Qu Wenruo
2022-08-22  8:00 ` Graham Cobb
2022-08-22  8:07   ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecfb1ac1-1b1d-e2da-82b8-fd6b0627acd4@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=g.btrfs@cobb.uk.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox