[v6 00/16] Btrfs-progs offline scrub

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [v6 00/16] Btrfs-progs offline scrub
@ 2018-01-05 11:01 Gu Jinxiang
  2018-01-05 11:01 ` [v6 01/16] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Gu Jinxiang
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs

For any one who wants to try it, it can be get from my repo:
https://github.com/gujx2017/btrfs-progs/tree/offline-scrub/

In this v6, just rebase to v4.14 and a test for offline-scrub.

Several reports on kernel scrub screwing up good data stripes are in ML for sometime.

And since kernel scrub won't account P/Q corruption, it makes us quite 
hard to detect error like kernel screwing up P/Q when scrubbing.

To get a comparable tool for kernel scrub, we need a user-space tool to act as benchmark to compare their different behaviors.

So here is the patchset for user-space scrub.

Which can do:
1) All mirror/backup check for non-parity based stripe
   Which means for RAID1/DUP/RAID10, we can really check all mirrors
   other than the 1st good mirror.

   Current "--check-data-csum" option should be finally replaced by
   offline scrub.
   As "--check-data-csum" doesn't really check all mirrors, if it hits
   a good copy, then resting copies will just be ignored.

   In v4 update, data check is further improved, inspired by kernel
   behavior, now data extent is checked sector by sector, so it can
   handle the following corruption case:

   Data extent A contains data from 0~28K.
   And |///| = corrupted  |   | = good
             0   4k  8k  12k 16k 20k 24k 28k
   Mirror 0  |///|   |///|   |///|   |   |
   Mirror 1  |   |///|   |///|   |///|   |

   Extent A should be RECOVERABLE, while in v3 we treat data extent A as
   a whole unit, above case is reported as CORRUPTED.

2) RAID5/6 full stripe check
   It will take full use of btrfs csum(both tree and data).
   It will only recover the full stripe if all recovered data matches
   with its csum.

   NOTE: Due to the lack of good bitmap facilities, RAID56 sector by
   sector repair will be quite complex, especially when NODATASUM is
   involved.

   So current RAID56 doesn't support vertical sector recovery yet.

   Data extent A contains data from 0~64K
   And |///| = corrupted while |   | = good
                  0   8K  16K 24K 32K 40K 48K 56K 64K
   Data stripe 0  |///|   |///|   |///|   |///|   |
   Data stripe 1  |   |///|   |///|   |///|   |///|
   Parity         |   |   |   |   |   |   |   |   |

   Kernel will recover it, while current scrub will report it as
   CORRUPTED.

3) Repair
   In v4 update, repair is finally added.

And this patchset also introduces new btrfs_map_block() function, 
which is more flex than current btrfs_map_block(), and has a unified interface for all profiles, not just an extra array for RAID56.

Check the 6th and 7th patch for details.

They are already used in RAID5/6 scrub, but can also be used for other profiles too.

The to-do list has been shortened, since repair is added in v4 update.
1) Test cases
   Need to make the infrastructure able to handle multi-device first.

2) Make btrfsck able to handle RAID5 with missing device
   Now it doesn't even open RAID5 btrfs with missing device, even though
   scrub should be able to handle it.

3) RAID56 vertical sector repair
   Although I consider such case is minor compared to RAID1 vertical
   sector repair.
   As for RAID1, an extent can be as large as 128M, while for RAID56 one
   stripe will always be 64K, much smaller than RAID1 case, making the
   possibility lower.

   I prefer to add this function after the patchset get merged, as no
   one really likes get 20 mails every time I update the patchset.

For guys who want to review the patchset, there is a basic function relationships slide.
I hope this will reduce the time needed to get what the patchset is doing.
https://docs.google.com/presentation/d/1tAU3lUVaRUXooSjhFaDUeyW3wauHDS
g9H-AiLBOSuIM/edit?usp=sharing

Changelog:
V0.8 RFC:
   Initial RFC patchset

v1:
   First formal patchset.
   RAID6 recovery support added, mainly copied from kernel radi6 lib.
   Cleaner recovery logical.

v2:
   More comments in both code and commit message, suggested by David.
   File re-arrangement, no check/ dir, raid56.ch moved to kernel-lib,
   Suggested by David

v3:
  Put "--offline" option to scrub, other than put it in fsck.
  Use bitmap to read multiple csums in one run, to improve performance.
  Add --progress/--no-progress option, to tell user we're not just
  wasting CPU and IO.

v4:
  Improve data check. Make data extent to be checked sector by sector.
  And make repair to be supported.

v5:
  just make some small fixups of comments on the left 15 patches,
  according to problems pointed out by David when mergering the first
  5 patches of this patchset.
  And rebase it to 93a9004dde410d920f08f85c6365e138713992d8.

v6:
  rebase to v4.14.
  add a test for offline-scrub.

Gu Jinxiang (1):
  btrfs-progs: add test for offline-scrub

Qu Wenruo (15):
  btrfs-progs: Introduce new btrfs_map_block function which returns more
    unified result.
  btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  btrfs-progs: csum: Introduce function to read out data csums
  btrfs-progs: scrub: Introduce structures to support offline scrub for
    RAID56
  btrfs-progs: scrub: Introduce functions to scrub mirror based tree
    block
  btrfs-progs: scrub: Introduce functions to scrub mirror based data
    blocks
  btrfs-progs: scrub: Introduce function to scrub one mirror-based
    extent
  btrfs-progs: scrub: Introduce function to scrub one data stripe
  btrfs-progs: scrub: Introduce function to verify parities
  btrfs-progs: extent-tree: Introduce function to check if there is any
    extent in given range.
  btrfs-progs: scrub: Introduce function to recover data parity
  btrfs-progs: scrub: Introduce helper to write a full stripe
  btrfs-progs: scrub: Introduce a function to scrub one full stripe
  btrfs-progs: scrub: Introduce function to check a whole block group
  btrfs-progs: scrub: Introduce offline scrub function

 Documentation/btrfs-scrub.asciidoc                 |    9 +
 Makefile                                           |    9 +-
 cmds-scrub.c                                       |  116 +-
 csum.c                                             |  130 ++
 ctree.h                                            |   12 +
 disk-io.c                                          |    4 +-
 disk-io.h                                          |    2 +
 extent-tree.c                                      |   60 +
 kerncompat.h                                       |    3 +
 scrub.c                                            | 1363 ++++++++++++++++++++
 tests/scrub-tests.sh                               |   43 +
 tests/scrub-tests/001-offline-scrub-raid10/test.sh |   50 +
 utils.h                                            |   11 +
 volumes.c                                          |  282 ++++
 volumes.h                                          |   78 ++
 15 files changed, 2164 insertions(+), 8 deletions(-)
 create mode 100644 csum.c
 create mode 100644 scrub.c
 create mode 100755 tests/scrub-tests.sh
 create mode 100755 tests/scrub-tests/001-offline-scrub-raid10/test.sh

-- 
2.14.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [v6 01/16] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result.
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 02/16] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Gu Jinxiang
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, __btrfs_map_block_v2().

Unlike old btrfs_map_block(), which needs different parameter to handle
different RAID profile, this new function uses unified btrfs_map_block
structure to handle all RAID profile in a more meaningful method:

Return physical address along with logical address for each stripe.

For RAID1/Single/DUP (none-stripped):
result would be like:
Map block: Logical 128M, Len 10M, Type RAID1, Stripe len 0, Nr_stripes 2
Stripe 0: Logical 128M, Physical X, Len: 10M Dev dev1
Stripe 1: Logical 128M, Physical Y, Len: 10M Dev dev2

Result will be as long as possible, since it's not stripped at all.

For RAID0/10 (stripped without parity):
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID10, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 64K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical 128K, Physical Z, Len 64K Dev dev3
Stripe 3: Logical 128K, Physical W, Len 64K Dev dev4

For RAID5/6 (stripped with parity and dev-rotation):
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID6, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 128K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical RAID5_P, Physical Z, Len 64K Dev dev3
Stripe 3: Logical RAID6_Q, Physical W, Len 64K Dev dev4

The new unified layout should be very flex and can even handle things
like N-way RAID1 (which old mirror_num basic one can't handle well).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 volumes.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 volumes.h |  78 +++++++++++++++++++++++++++
 2 files changed, 259 insertions(+)

diff --git a/volumes.c b/volumes.c
index ce3a5405..2d23712a 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1620,6 +1620,187 @@ out:
 	return 0;
 }
 
+static inline struct btrfs_map_block *alloc_map_block(int num_stripes)
+{
+	struct btrfs_map_block *ret;
+	int size;
+
+	size = sizeof(struct btrfs_map_stripe) * num_stripes +
+		sizeof(struct btrfs_map_block);
+	ret = malloc(size);
+	if (!ret)
+		return NULL;
+	memset(ret, 0, size);
+	return ret;
+}
+
+static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
+			       struct btrfs_map_block *map_block)
+{
+	u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+	u64 bg_start = map->ce.start;
+	u64 bg_end = bg_start + map->ce.size;
+	u64 bg_offset = start - bg_start; /* offset inside the block group */
+	u64 fstripe_logical = 0;	/* Full stripe start logical bytenr */
+	u64 fstripe_size = 0;		/* Full stripe logical size */
+	u64 fstripe_phy_off = 0;	/* Full stripe offset in each dev */
+	u32 stripe_len = map->stripe_len;
+	int sub_stripes = map->sub_stripes;
+	int data_stripes = nr_data_stripes(map);
+	int dev_rotation;
+	int i;
+
+	map_block->num_stripes = map->num_stripes;
+	map_block->type = profile;
+
+	/*
+	 * Common full stripe data for stripe based profiles
+	 */
+	if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10 |
+		       BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+		fstripe_size = stripe_len * data_stripes;
+		if (sub_stripes)
+			fstripe_size /= sub_stripes;
+		fstripe_logical = bg_offset / fstripe_size * fstripe_size +
+				    bg_start;
+		fstripe_phy_off = bg_offset / fstripe_size * stripe_len;
+	}
+
+	switch (profile) {
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case 0: /* SINGLE */
+		/*
+		 * None-stripe mode, (Single, DUP and RAID1)
+		 * Just use offset to fill map_block
+		 */
+		map_block->stripe_len = 0;
+		map_block->start = start;
+		map_block->length = min(bg_end, start + length) - start;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+
+			stripe = &map_block->stripes[i];
+
+			stripe->dev = map->stripes[i].dev;
+			stripe->logical = start;
+			stripe->physical = map->stripes[i].physical + bg_offset;
+			stripe->length = map_block->length;
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID10:
+	case BTRFS_BLOCK_GROUP_RAID0:
+		/*
+		 * Stripe modes without parity (0 and 10)
+		 * Return the whole full stripe
+		 */
+
+		map_block->start = fstripe_logical;
+		map_block->length = fstripe_size;
+		map_block->stripe_len = map->stripe_len;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+			u64 cur_offset;
+
+			/* Handle RAID10 sub stripes */
+			if (sub_stripes)
+				cur_offset = i / sub_stripes * stripe_len;
+			else
+				cur_offset = stripe_len * i;
+			stripe = &map_block->stripes[i];
+
+			stripe->dev = map->stripes[i].dev;
+			stripe->logical = fstripe_logical + cur_offset;
+			stripe->length = stripe_len;
+			stripe->physical = map->stripes[i].physical +
+					   fstripe_phy_off;
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID6:
+		/*
+		 * Stripe modes with parity and device rotation (5 and 6)
+		 *
+		 * Return the whole full stripe
+		 */
+
+		dev_rotation = (bg_offset / fstripe_size) % map->num_stripes;
+
+		map_block->start = fstripe_logical;
+		map_block->length = fstripe_size;
+		map_block->stripe_len = map->stripe_len;
+		for (i = 0; i < map->num_stripes; i++) {
+			struct btrfs_map_stripe *stripe;
+			int dest_index;
+			u64 cur_offset = stripe_len * i;
+
+			stripe = &map_block->stripes[i];
+
+			dest_index = (i + dev_rotation) % map->num_stripes;
+			stripe->dev = map->stripes[dest_index].dev;
+			stripe->length = stripe_len;
+			stripe->physical = map->stripes[dest_index].physical +
+					   fstripe_phy_off;
+			if (i < data_stripes) {
+				/* data stripe */
+				stripe->logical = fstripe_logical +
+						  cur_offset;
+			} else if (i == data_stripes) {
+				/* P */
+				stripe->logical = BTRFS_RAID5_P_STRIPE;
+			} else {
+				/* Q */
+				stripe->logical = BTRFS_RAID6_Q_STRIPE;
+			}
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
+			 u64 length, struct btrfs_map_block **map_ret)
+{
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	struct btrfs_map_block *map_block;
+	int ret;
+
+	/* Eearly parameter check */
+	if (!length || !map_ret) {
+		error("wrong parameter for %s", __func__);
+		return -EINVAL;
+	}
+
+	ce = search_cache_extent(&fs_info->mapping_tree.cache_tree, logical);
+	if (!ce)
+		return -ENOENT;
+	if (ce->start > logical)
+		return -ENOENT;
+
+	map = container_of(ce, struct map_lookup, ce);
+	/*
+	 * Allocate a full map_block anyway
+	 *
+	 * For write, we need the full map_block anyway.
+	 * For read, it will be striped to the needed stripe before returning.
+	 */
+	map_block = alloc_map_block(map->num_stripes);
+	if (!map_block)
+		return -ENOMEM;
+	ret = fill_full_map_block(map, logical, length, map_block);
+	if (ret < 0) {
+		free(map_block);
+		return ret;
+	}
+	/* TODO: Remove unrelated map_stripes for READ operation */
+
+	*map_ret = map_block;
+	return 0;
+}
+
 struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 				       u8 *uuid, u8 *fsid)
 {
diff --git a/volumes.h b/volumes.h
index 11572e78..0fadbdd7 100644
--- a/volumes.h
+++ b/volumes.h
@@ -108,6 +108,51 @@ struct map_lookup {
 	struct btrfs_bio_stripe stripes[];
 };
 
+struct btrfs_map_stripe {
+	struct btrfs_device *dev;
+
+	/*
+	 * Logical address of the stripe start.
+	 * Caller should check if this logical is the desired map start.
+	 * It's possible that the logical is smaller or larger than desired
+	 * map range.
+	 *
+	 * For P/Q stipre, it will be BTRFS_RAID5_P_STRIPE
+	 * and BTRFS_RAID6_Q_STRIPE.
+	 */
+	u64 logical;
+
+	u64 physical;
+
+	/* The length of the stripe */
+	u64 length;
+};
+
+struct btrfs_map_block {
+	/*
+	 * The logical start of the whole map block.
+	 * For RAID5/6 it will be the bytenr of the full stripe start,
+	 * so it's possible that @start is smaller than desired map range
+	 * start.
+	 */
+	u64 start;
+
+	/*
+	 * The logical length of the map block.
+	 * For RAID5/6 it will be total data stripe size
+	 */
+	u64 length;
+
+	/* Block group type */
+	u64 type;
+
+	/* Stripe length, for non-stripped mode, it will be 0 */
+	u32 stripe_len;
+
+	int num_stripes;
+	struct btrfs_map_stripe stripes[];
+};
+
 #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
 			    (sizeof(struct btrfs_bio_stripe) * (n)))
 #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
@@ -187,6 +232,39 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
 		    u64 logical, u64 *length,
 		    struct btrfs_multi_bio **multi_ret, int mirror_num,
 		    u64 **raid_map_ret);
+
+/*
+ * TODO: Use this map_block_v2 to replace __btrfs_map_block()
+ *
+ * New btrfs_map_block(), unlike old one, each stripe will contain the
+ * physical offset *AND* logical address.
+ * So caller won't ever need to care about how the stripe/mirror is organized.
+ * Which makes csum check quite easy.
+ *
+ * Only P/Q based profile needs to care their P/Q stripe.
+ *
+ * @map_ret example:
+ * Raid1:
+ * Map block: logical=128M len=10M type=RAID1 stripe_len=0 nr_stripes=2
+ * Stripe 0: logical=128M physical=X len=10M dev=devid1
+ * Stripe 1: logical=128M physical=Y len=10M dev=devid2
+ *
+ * Raid10:
+ * Map block: logical=64K len=128K type=RAID10 stripe_len=64K nr_stripes=4
+ * Stripe 0: logical=64K physical=X len=64K dev=devid1
+ * Stripe 1: logical=64K physical=Y len=64K dev=devid2
+ * Stripe 2: logical=128K physical=Z len=64K dev=devid3
+ * Stripe 3: logical=128K physical=W len=64K dev=devid4
+ *
+ * Raid6:
+ * Map block: logical=64K len=128K type=RAID6 stripe_len=64K nr_stripes=4
+ * Stripe 0: logical=64K physical=X len=64K dev=devid1
+ * Stripe 1: logical=128K physical=Y len=64K dev=devid2
+ * Stripe 2: logical=RAID5_P physical=Z len=64K dev=devid3
+ * Stripe 3: logical=RAID6_Q physical=W len=64K dev=devid4
+ */
+int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
+			 u64 length, struct btrfs_map_block **map_ret);
 int btrfs_next_bg(struct btrfs_fs_info *map_tree, u64 *logical,
 		     u64 *size, u64 type);
 static inline int btrfs_next_bg_metadata(struct btrfs_fs_info *fs_info,
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 02/16] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
  2018-01-05 11:01 ` [v6 01/16] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 03/16] btrfs-progs: csum: Introduce function to read out data csums Gu Jinxiang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

For READ, caller normally hopes to get what they request, other than
full stripe map.

In this case, we should remove unrelated stripe map, just like the
following case:
               32K               96K
               |<-request range->|
         0              64k           128K
RAID0:   |    Data 1    |   Data 2    |
              disk1         disk2
Before this patch, we return the full stripe:
Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2

After this patch, we limit the stripe result to the request range:
Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2

And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
parities.

This should make caller easier to use.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 volumes.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/volumes.c b/volumes.c
index 2d23712a..72399cde 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1760,6 +1760,107 @@ static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
 	return 0;
 }
 
+static void del_one_stripe(struct btrfs_map_block *map_block, int i)
+{
+	int cur_nr = map_block->num_stripes;
+	int size_left = (cur_nr - 1 - i) * sizeof(struct btrfs_map_stripe);
+
+	memmove(&map_block->stripes[i], &map_block->stripes[i + 1], size_left);
+	map_block->num_stripes--;
+}
+
+static void remove_unrelated_stripes(struct map_lookup *map,
+				     int rw, u64 start, u64 length,
+				     struct btrfs_map_block *map_block)
+{
+	int i = 0;
+	/*
+	 * RAID5/6 write must use full stripe.
+	 * No need to do anything.
+	 */
+	if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) &&
+	    rw == WRITE)
+		return;
+
+	/*
+	 * For RAID0/1/10/DUP, whatever read/write, we can remove unrelated
+	 * stripes without causing anything wrong.
+	 * RAID5/6 READ is just like RAID0, we don't care parity unless we need
+	 * to recovery.
+	 * For recovery, rw should be set to WRITE.
+	 */
+	while (i < map_block->num_stripes) {
+		struct btrfs_map_stripe *stripe;
+		u64 orig_logical; /* Original stripe logical start */
+		u64 orig_end; /* Original stripe logical end */
+
+		stripe = &map_block->stripes[i];
+
+		/*
+		 * For READ, we don't really care parity
+		 */
+		if (stripe->logical == BTRFS_RAID5_P_STRIPE ||
+		    stripe->logical == BTRFS_RAID6_Q_STRIPE) {
+			del_one_stripe(map_block, i);
+			continue;
+		}
+		/* Completely unrelated stripe */
+		if (stripe->logical >= start + length ||
+		    stripe->logical + stripe->length <= start) {
+			del_one_stripe(map_block, i);
+			continue;
+		}
+		/* Covered stripe, modify its logical and physical */
+		orig_logical = stripe->logical;
+		orig_end = stripe->logical + stripe->length;
+		if (start + length <= orig_end) {
+			/*
+			 * |<--range-->|
+			 *   |  stripe   |
+			 * Or
+			 *     |<range>|
+			 *   |  stripe   |
+			 */
+			stripe->logical = max(orig_logical, start);
+			stripe->length = start + length;
+			stripe->physical += stripe->logical - orig_logical;
+		} else if (start >= orig_logical) {
+			/*
+			 *     |<-range--->|
+			 * |  stripe     |
+			 * Or
+			 *     |<range>|
+			 * |  stripe     |
+			 */
+			stripe->logical = start;
+			stripe->length = min(orig_end, start + length);
+			stripe->physical += stripe->logical - orig_logical;
+		}
+		/*
+		 * Remaining case:
+		 * |<----range----->|
+		 *   | stripe |
+		 * No need to do any modification
+		 */
+		i++;
+	}
+
+	/* Recaculate map_block size */
+	map_block->start = 0;
+	map_block->length = 0;
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct btrfs_map_stripe *stripe;
+
+		stripe = &map_block->stripes[i];
+		if (stripe->logical > map_block->start)
+			map_block->start = stripe->logical;
+		if (stripe->logical + stripe->length >
+		    map_block->start + map_block->length)
+			map_block->length = stripe->logical + stripe->length -
+					    map_block->start;
+	}
+}
+
 int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
 			 u64 length, struct btrfs_map_block **map_ret)
 {
@@ -1795,7 +1896,7 @@ int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
 		free(map_block);
 		return ret;
 	}
-	/* TODO: Remove unrelated map_stripes for READ operation */
+	remove_unrelated_stripes(map, rw, logical, length, map_block);
 
 	*map_ret = map_block;
 	return 0;
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 03/16] btrfs-progs: csum: Introduce function to read out data csums
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
  2018-01-05 11:01 ` [v6 01/16] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Gu Jinxiang
  2018-01-05 11:01 ` [v6 02/16] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 04/16] btrfs-progs: scrub: Introduce structures to support offline scrub for RAID56 Gu Jinxiang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo, Su Yue

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function: btrfs_read_data_csums(), to read out csums
for sectors in range.

This is quite useful for read out data csum so we don't need to do it
using open code.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 Makefile     |   2 +-
 csum.c       | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ctree.h      |   4 ++
 kerncompat.h |   3 ++
 utils.h      |   5 +++
 5 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 csum.c

diff --git a/Makefile b/Makefile
index 6369e8f4..ab45ab7f 100644
--- a/Makefile
+++ b/Makefile
@@ -106,7 +106,7 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
-	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
+	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o csum.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
diff --git a/csum.c b/csum.c
new file mode 100644
index 00000000..a2ce755e
--- /dev/null
+++ b/csum.c
@@ -0,0 +1,130 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include "kerncompat.h"
+#include "kernel-lib/bitops.h"
+#include "ctree.h"
+#include "utils.h"
+
+/*
+ * TODO:
+ * 1) Add write support for csum
+ *    So we can write new data extents and add csum into csum tree
+ *
+ * Get csums of range[@start, @start + len).
+ *
+ * @start:    Start offset, shall be aligned to sectorsize.
+ * @len:      Length, shall be aligned to sectorsize.
+ * @csum_ret: The size of csum_ret shall be @len / sectorsize * csum_size.
+ * @bit_map:  Every bit corresponds to the offset have csum or not.
+ *            The size in byte of bit_map should be
+ *            calculate_bitmap_len(csum_ret's size / csum_size).
+ *
+ * Returns 0  means success
+ * Returns >0 means on error
+ * Returns <0 means on fatal error
+ */
+
+int btrfs_read_data_csums(struct btrfs_fs_info *fs_info, u64 start, u64 len,
+			  void *csum_ret, unsigned long *bitmap_ret)
+
+{
+	struct btrfs_path path;
+	struct btrfs_key key;
+	struct btrfs_root *csum_root = fs_info->csum_root;
+	u32 item_offset;
+	u32 item_size;
+	u32 final_offset;
+	u32 final_len;
+	u32 i;
+	u32 sectorsize = fs_info->sectorsize;
+	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+	u64 cur_start;
+	u64 cur_end;
+	int found = 0;
+	int ret;
+
+	ASSERT(IS_ALIGNED(start, sectorsize));
+	ASSERT(IS_ALIGNED(len, sectorsize));
+	ASSERT(csum_ret);
+	ASSERT(bitmap_ret);
+
+	memset(bitmap_ret, 0, calculate_bitmap_len(len / sectorsize));
+	btrfs_init_path(&path);
+
+	key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+	key.type = BTRFS_EXTENT_CSUM_KEY;
+	key.offset = start;
+
+	ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = btrfs_previous_item(csum_root, &path,
+					  BTRFS_EXTENT_CSUM_OBJECTID,
+					  BTRFS_EXTENT_CSUM_KEY);
+		if (ret < 0)
+			goto out;
+	}
+	/* The csum tree may be empty. */
+	if (!btrfs_header_nritems(path.nodes[0]))
+		goto next;
+
+	while (1) {
+		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+
+		if (!IS_ALIGNED(key.offset, sectorsize)) {
+			error("csum item bytenr %llu is not aligned to %u",
+			      key.offset, sectorsize);
+			ret = -EIO;
+			break;
+		}
+		/* exceeds end */
+		if (key.offset >= start + len)
+			break;
+
+		item_offset = btrfs_item_ptr_offset(path.nodes[0],
+						    path.slots[0]);
+		item_size = btrfs_item_size_nr(path.nodes[0], path.slots[0]);
+
+		if (key.offset + item_size / csum_size * sectorsize < start)
+			goto next;
+
+		/* get start of the extent */
+		cur_start = max(start, key.offset);
+		/* get end of the extent */
+		cur_end = min(start + len, key.offset + item_size / csum_size *
+			      sectorsize);
+
+		final_offset = (cur_start - key.offset) / sectorsize *
+			csum_size + item_offset;
+		final_len = (cur_end - cur_start) / sectorsize * csum_size;
+		read_extent_buffer(path.nodes[0],
+				   (csum_ret + (cur_start - start) /
+				    sectorsize * csum_size),
+				   final_offset, final_len);
+
+		for (i = 0; i != final_len / csum_size; i++)
+			set_bit(i + (cur_start - start) / sectorsize,
+				bitmap_ret);
+
+		found = 1;
+next:
+		ret = btrfs_next_item(csum_root, &path);
+		if (ret)
+			break;
+	}
+out:
+	if (ret >= 0)
+		ret = !found;
+	btrfs_release_path(&path);
+	return ret;
+}
diff --git a/ctree.h b/ctree.h
index b92df1c1..a7d26455 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2761,4 +2761,8 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
 int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		    char *dest);
 
+/* csum.c */
+int btrfs_read_data_csums(struct btrfs_fs_info *fs_info, u64 start, u64 len,
+			  void *csum_ret, unsigned long *bitmap_ret);
+
 #endif
diff --git a/kerncompat.h b/kerncompat.h
index fa96715f..4eb62f68 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -273,6 +273,9 @@ static inline int IS_ERR_OR_NULL(const void *ptr)
 #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
 #define round_down(x, y) ((x) & ~__round_mask(x, y))
 
+#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+#define DIV_ROUND_UP __KERNEL_DIV_ROUND_UP
+
 /*
  * printk
  */
diff --git a/utils.h b/utils.h
index a82d46f6..5d869a50 100644
--- a/utils.h
+++ b/utils.h
@@ -29,6 +29,7 @@
 #include "sizes.h"
 #include "messages.h"
 #include "ioctl.h"
+#include "kerncompat.h"
 
 #define BTRFS_SCAN_MOUNTED	(1ULL << 0)
 #define BTRFS_SCAN_LBLKID	(1ULL << 1)
@@ -76,6 +77,10 @@ struct seen_fsid {
 	DIR *dirstream;
 	int fd;
 };
+static inline int calculate_bitmap_len(int nsectors)
+{
+	return (DIV_ROUND_UP(nsectors, BITS_PER_LONG) * sizeof(long));
+}
 
 int btrfs_make_root_dir(struct btrfs_trans_handle *trans,
 			struct btrfs_root *root, u64 objectid);
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 04/16] btrfs-progs: scrub: Introduce structures to support offline scrub for RAID56
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (2 preceding siblings ...)
  2018-01-05 11:01 ` [v6 03/16] btrfs-progs: csum: Introduce function to read out data csums Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 05/16] btrfs-progs: scrub: Introduce functions to scrub mirror based tree block Gu Jinxiang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introuduce new local structures, scrub_full_stripe and scrub_stripe, for
incoming offline RAID56 scrub support.

For pure stripe/mirror based profiles, like raid0/1/10/dup/single, we
will follow the original bytenr and mirror number based iteration, so
they don't need any extra structures for these profiles.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 Makefile |   3 +-
 scrub.c  | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 scrub.c

diff --git a/Makefile b/Makefile
index ab45ab7f..fa3ebc86 100644
--- a/Makefile
+++ b/Makefile
@@ -106,7 +106,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
-	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o csum.o
+	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o csum.o \
+	  scrub.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
diff --git a/scrub.c b/scrub.c
new file mode 100644
index 00000000..41c40108
--- /dev/null
+++ b/scrub.c
@@ -0,0 +1,119 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+/*
+ * Main part to implement offline(unmounted) btrfs scrub
+ */
+
+#include <unistd.h>
+#include "ctree.h"
+#include "volumes.h"
+#include "disk-io.h"
+#include "utils.h"
+
+/*
+ * For parity based profile (RAID56)
+ * Mirror/stripe based on won't need this. They are iterated by bytenr and
+ * mirror number.
+ */
+struct scrub_stripe {
+	/* For P/Q logical start will be BTRFS_RAID5/6_P/Q_STRIPE */
+	u64 logical;
+
+	u64 physical;
+
+	/* Device is missing */
+	unsigned int dev_missing:1;
+
+	/* Any tree/data csum mismatches */
+	unsigned int csum_mismatch:1;
+
+	/* Some data doesn't have csum (nodatasum) */
+	unsigned int csum_missing:1;
+
+	/* Device fd, to write correct data back to disc */
+	int fd;
+
+	char *data;
+};
+
+/*
+ * RAID56 full stripe (data stripes + P/Q)
+ */
+struct scrub_full_stripe {
+	u64 logical_start;
+	u64 logical_len;
+	u64 bg_type;
+	u32 nr_stripes;
+	u32 stripe_len;
+
+	/* Read error stripes */
+	u32 err_read_stripes;
+
+	/* Missing devices */
+	u32 err_missing_devs;
+
+	/* Csum error data stripes */
+	u32 err_csum_dstripes;
+
+	/* Missing csum data stripes */
+	u32 missing_csum_dstripes;
+
+	/* currupted stripe index */
+	int corrupted_index[2];
+
+	int nr_corrupted_stripes;
+
+	/* Already recovered once? */
+	unsigned int recovered:1;
+
+	struct scrub_stripe stripes[];
+};
+
+static void free_full_stripe(struct scrub_full_stripe *fstripe)
+{
+	int i;
+
+	for (i = 0; i < fstripe->nr_stripes; i++)
+		free(fstripe->stripes[i].data);
+	free(fstripe);
+}
+
+static struct scrub_full_stripe *alloc_full_stripe(int nr_stripes,
+						    u32 stripe_len)
+{
+	struct scrub_full_stripe *ret;
+	int size = sizeof(*ret) + sizeof(unsigned long *) +
+		nr_stripes * sizeof(struct scrub_stripe);
+	int i;
+
+	ret = malloc(size);
+	if (!ret)
+		return NULL;
+
+	memset(ret, 0, size);
+	ret->nr_stripes = nr_stripes;
+	ret->stripe_len = stripe_len;
+	ret->corrupted_index[0] = -1;
+	ret->corrupted_index[1] = -1;
+
+	/* Alloc data memory for each stripe */
+	for (i = 0; i < nr_stripes; i++) {
+		struct scrub_stripe *stripe = &ret->stripes[i];
+
+		stripe->data = malloc(stripe_len);
+		if (!stripe->data) {
+			free_full_stripe(ret);
+			return NULL;
+		}
+	}
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 05/16] btrfs-progs: scrub: Introduce functions to scrub mirror based tree block
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (3 preceding siblings ...)
  2018-01-05 11:01 ` [v6 04/16] btrfs-progs: scrub: Introduce structures to support offline scrub for RAID56 Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 06/16] btrfs-progs: scrub: Introduce functions to scrub mirror based data blocks Gu Jinxiang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce new functions, check/recover_tree_mirror(), to check and
recover mirror-based tree blocks (Single/DUP/RAID0/1/10).

check_tree_mirror() can also be used on in-memory tree blocks using @data
parameter.
This is very handy for RAID5/6 case, either checking the data stripe
tree block by @bytenr and 0 as @mirror, or using @data parameter for
recovered in-memory data.

While recover_tree_mirror() is only used for mirror-based profiles, as
RAID56 recovery is done by stripe unit, not mirror unit.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 disk-io.c |   4 +-
 disk-io.h |   2 +
 scrub.c   | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index f5edc479..1abc6f71 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -51,8 +51,8 @@ static u32 max_nritems(u8 level, u32 nodesize)
 		sizeof(struct btrfs_key_ptr));
 }
 
-static int check_tree_block(struct btrfs_fs_info *fs_info,
-			    struct extent_buffer *buf)
+int check_tree_block(struct btrfs_fs_info *fs_info,
+		     struct extent_buffer *buf)
 {
 
 	struct btrfs_fs_devices *fs_devices;
diff --git a/disk-io.h b/disk-io.h
index f6a422f2..0ed7624e 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -118,6 +118,8 @@ int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirr
 struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 		u64 parent_transid);
 
+int check_tree_block(struct btrfs_fs_info *fs_info,
+		     struct extent_buffer *buf);
 int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical,
 		     u64 *len, int mirror);
 void readahead_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
diff --git a/scrub.c b/scrub.c
index 41c40108..00786dd3 100644
--- a/scrub.c
+++ b/scrub.c
@@ -117,3 +117,148 @@ static struct scrub_full_stripe *alloc_full_stripe(int nr_stripes,
 	}
 	return ret;
 }
+
+static inline int is_data_stripe(struct scrub_stripe *stripe)
+{
+	u64 bytenr = stripe->logical;
+
+	if (bytenr == BTRFS_RAID5_P_STRIPE || bytenr == BTRFS_RAID6_Q_STRIPE)
+		return 0;
+	return 1;
+}
+
+/*
+ * Check one tree mirror given by @bytenr and @mirror, or @data.
+ * If @data is not given (NULL), the function will try to read out tree block
+ * using @bytenr and @mirror.
+ * If @data is given, use data directly, won't try to read from disk.
+ *
+ * The extra @data prameter is handy for RAID5/6 recovery code to verify
+ * the recovered data.
+ *
+ * Return 0 if everything is OK.
+ * Return <0 something goes wrong, and @scrub_ctx accounting will be updated
+ * if it's a data corruption.
+ */
+static int check_tree_mirror(struct btrfs_fs_info *fs_info,
+			     struct btrfs_scrub_progress *scrub_ctx,
+			     char *data, u64 bytenr, int mirror)
+{
+	struct extent_buffer *eb;
+	u32 nodesize = fs_info->nodesize;
+	int ret;
+
+	if (!IS_ALIGNED(bytenr, fs_info->sectorsize)) {
+		/* Such error will be reported by check_tree_block() */
+		scrub_ctx->verify_errors++;
+		return -EIO;
+	}
+
+	eb = btrfs_find_create_tree_block(fs_info, bytenr);
+	if (!eb)
+		return -ENOMEM;
+	if (data) {
+		memcpy(eb->data, data, nodesize);
+	} else {
+		ret = read_whole_eb(fs_info, eb, mirror);
+		if (ret) {
+			scrub_ctx->read_errors++;
+			error("failed to read tree block %llu mirror %d",
+			      bytenr, mirror);
+			goto out;
+		}
+	}
+
+	scrub_ctx->tree_bytes_scrubbed += nodesize;
+	if (csum_tree_block(fs_info, eb, 1)) {
+		error("tree block %llu mirror %d checksum mismatch", bytenr,
+			mirror);
+		scrub_ctx->csum_errors++;
+		ret = -EIO;
+		goto out;
+	}
+	ret = check_tree_block(fs_info, eb);
+	if (ret < 0) {
+		error("tree block %llu mirror %d is invalid", bytenr, mirror);
+		scrub_ctx->verify_errors++;
+		goto out;
+	}
+
+	scrub_ctx->tree_extents_scrubbed++;
+out:
+	free_extent_buffer(eb);
+	return ret;
+}
+
+/*
+ * read_extent_data() helper
+ *
+ * This function will handle short read and update @scrub_ctx when read
+ * error happens.
+ */
+static int read_extent_data_loop(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 char *buf, u64 start, u64 len, int mirror)
+{
+	int ret = 0;
+	u64 cur = 0;
+
+	while (cur < len) {
+		u64 read_len = len - cur;
+
+		ret = read_extent_data(fs_info, buf + cur,
+					start + cur, &read_len, mirror);
+		if (ret < 0) {
+			error("failed to read out data at bytenr %llu mirror %d",
+				start + cur, mirror);
+			scrub_ctx->read_errors++;
+			break;
+		}
+		cur += read_len;
+	}
+	return ret;
+}
+
+/*
+ * Recover all other (corrupted) mirrors for tree block.
+ *
+ * The method is quite simple, just read out the correct mirror specified by
+ * @good_mirror and write back correct data to all other blocks
+ */
+static int recover_tree_mirror(struct btrfs_fs_info *fs_info,
+			       struct btrfs_scrub_progress *scrub_ctx,
+			       u64 start, int good_mirror)
+{
+	char *buf;
+	u32 nodesize = fs_info->nodesize;
+	int i;
+	int num_copies;
+	int ret;
+
+	buf = malloc(nodesize);
+	if (!buf)
+		return -ENOMEM;
+	ret = read_extent_data_loop(fs_info, scrub_ctx, buf, start, nodesize,
+				    good_mirror);
+	if (ret < 0) {
+		error("failed to read tree block at bytenr %llu mirror %d",
+			start, good_mirror);
+		goto out;
+	}
+
+	num_copies = btrfs_num_copies(fs_info, start, nodesize);
+	for (i = 0; i <= num_copies; i++) {
+		if (i == good_mirror)
+			continue;
+		ret = write_data_to_disk(fs_info, buf, start, nodesize, i);
+		if (ret < 0) {
+			error("failed to write tree block at bytenr %llu mirror %d",
+				start, i);
+			goto out;
+		}
+	}
+	ret = 0;
+out:
+	free(buf);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 06/16] btrfs-progs: scrub: Introduce functions to scrub mirror based data blocks
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (4 preceding siblings ...)
  2018-01-05 11:01 ` [v6 05/16] btrfs-progs: scrub: Introduce functions to scrub mirror based tree block Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 07/16] btrfs-progs: scrub: Introduce function to scrub one mirror-based extent Gu Jinxiang
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo, Su Yue

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce new function, check/recover_data_mirror(), to check and recover
mirror based data blocks.

Unlike tree block, data blocks must be recovered sector by sector, so we
introduced corrupted_bitmap for check and recover.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 scrub.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 212 insertions(+)

diff --git a/scrub.c b/scrub.c
index 00786dd3..cee6fe14 100644
--- a/scrub.c
+++ b/scrub.c
@@ -18,6 +18,7 @@
 #include "volumes.h"
 #include "disk-io.h"
 #include "utils.h"
+#include "kernel-lib/bitops.h"
 
 /*
  * For parity based profile (RAID56)
@@ -262,3 +263,214 @@ out:
 	free(buf);
 	return ret;
 }
+
+/*
+ * Check one data mirror given by @start @len and @mirror, or @data
+ * If @data is not given, try to read it from disk.
+ * This function will try to read out all the data then check sum.
+ *
+ * If @data is given, just use the data.
+ * This behavior is useful for RAID5/6 recovery code to verify recovered data.
+ *
+ * If @corrupt_bitmap is given, restore corrupted sector to that bitmap.
+ * This is useful for mirror based profiles to recover its data.
+ *
+ * Return 0 if everything is OK.
+ * Return <0 if something goes wrong, and @scrub_ctx accounting will be updated
+ * if it's a data corruption.
+ */
+static int check_data_mirror(struct btrfs_fs_info *fs_info,
+			     struct btrfs_scrub_progress *scrub_ctx,
+			     char *data, u64 start, u64 len, int mirror,
+			     unsigned long *corrupt_bitmap)
+{
+	u32 sectorsize = fs_info->sectorsize;
+	u32 data_csum;
+	u32 *csums = NULL;
+	char *buf = NULL;
+	int ret = 0;
+	int err = 0;
+	int i;
+	unsigned long *csum_bitmap = NULL;
+
+	if (!data) {
+		buf = malloc(len);
+		if (!buf)
+			return -ENOMEM;
+		ret = read_extent_data_loop(fs_info, scrub_ctx, buf, start,
+					     len, mirror);
+		if (ret < 0)
+			goto out;
+		scrub_ctx->data_bytes_scrubbed += len;
+	} else {
+		buf = data;
+	}
+
+	/* Alloc and Check csums */
+	csums = malloc(len / sectorsize * sizeof(data_csum));
+	if (!csums) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	csum_bitmap = malloc(calculate_bitmap_len(len / sectorsize));
+	if (!csum_bitmap) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (corrupt_bitmap)
+		memset(corrupt_bitmap, 0,
+			calculate_bitmap_len(len / sectorsize));
+	ret = btrfs_read_data_csums(fs_info, start, len, csums, csum_bitmap);
+	if (ret < 0)
+		goto out;
+
+	for (i = 0; i < len / sectorsize; i++) {
+		if (!test_bit(i, csum_bitmap)) {
+			scrub_ctx->csum_discards++;
+			continue;
+		}
+
+		data_csum = ~(u32)0;
+		data_csum = btrfs_csum_data(buf + i * sectorsize, data_csum,
+					    sectorsize);
+		btrfs_csum_final(data_csum, (u8 *)&data_csum);
+
+		if (memcmp(&data_csum, (char *)csums + i * sizeof(data_csum),
+				   sizeof(data_csum))) {
+			error("data at bytenr %llu mirror %d csum mismatch, have 0x%08x expect 0x%08x",
+			      start + i * sectorsize, mirror, data_csum,
+			      *(u32 *)((char *)csums + i * sizeof(data_csum)));
+			err = 1;
+			scrub_ctx->csum_errors++;
+			if (corrupt_bitmap)
+				set_bit(i, corrupt_bitmap);
+			continue;
+		}
+		scrub_ctx->data_bytes_scrubbed += sectorsize;
+	}
+out:
+	if (!data)
+		free(buf);
+	free(csums);
+	free(csum_bitmap);
+
+	if (!ret && err)
+		return -EIO;
+	return ret;
+}
+
+/* Helper to check all mirrors for a good copy */
+static int has_good_mirror(unsigned long *corrupt_bitmaps[], int num_copies,
+			   int bit, int *good_mirror)
+{
+	int found_good = 0;
+	int i;
+
+	for (i = 0; i < num_copies; i++) {
+		if (!test_bit(bit, corrupt_bitmaps[i])) {
+			found_good = 1;
+			if (good_mirror)
+				*good_mirror = i + 1;
+			break;
+		}
+	}
+	return found_good;
+}
+
+/*
+ * Helper function to check @corrupt_bitmaps, to verify if it's recoverable
+ * for mirror based data extent.
+ *
+ * Return 1 for recoverable, and 0 for not recoverable
+ */
+static int check_data_mirror_recoverable(struct btrfs_fs_info *fs_info,
+					 u64 start, u64 len, u32 sectorsize,
+					 unsigned long *corrupt_bitmaps[])
+{
+	int i;
+	int corrupted = 0;
+	int bit;
+	int num_copies = btrfs_num_copies(fs_info, start, len);
+
+	for (i = 0; i < num_copies; i++) {
+		for_each_set_bit(bit, corrupt_bitmaps[i], len / sectorsize) {
+			if (!has_good_mirror(corrupt_bitmaps, num_copies,
+					     bit, NULL)) {
+				corrupted = 1;
+				goto out;
+			}
+		}
+	}
+out:
+	return !corrupted;
+}
+
+/*
+ * Try to recover all corrupted sectors specified by @corrupt_bitmaps,
+ * by reading out good sector in other mirror.
+ */
+static int recover_data_mirror(struct btrfs_fs_info *fs_info,
+			       struct btrfs_scrub_progress *scrub_ctx,
+			       u64 start, u64 len,
+			       unsigned long *corrupt_bitmaps[])
+{
+	char *buf;
+	u32 sectorsize = fs_info->sectorsize;
+	int ret = 0;
+	int bit;
+	int i;
+	int bad_mirror;
+	int num_copies;
+
+	/* Don't bother to recover unrecoverable extents */
+	if (!check_data_mirror_recoverable(fs_info, start, len,
+					   sectorsize, corrupt_bitmaps))
+		return -EIO;
+
+	buf = malloc(sectorsize);
+	if (!buf)
+		return -ENOMEM;
+
+	num_copies = btrfs_num_copies(fs_info, start, len);
+	for (i = 0; i < num_copies; i++) {
+		for_each_set_bit(bit, corrupt_bitmaps[i], BITS_PER_LONG) {
+			u64 cur = start + bit * sectorsize;
+			int good;
+
+			/* Find good mirror */
+			ret = has_good_mirror(corrupt_bitmaps, num_copies, bit,
+					      &good);
+			if (!ret) {
+				error("failed to find good mirror for bytenr %llu",
+					cur);
+				ret = -EIO;
+				goto out;
+			}
+			/* Read out good mirror */
+			ret = read_data_from_disk(fs_info, buf, cur,
+						  sectorsize, good);
+			if (ret < 0) {
+				error("failed to read good mirror from bytenr %llu mirror %d",
+					cur, good);
+				goto out;
+			}
+			/* Write back to all other mirrors */
+			for (bad_mirror = 1; bad_mirror <= num_copies;
+			     bad_mirror++) {
+				if (bad_mirror == good)
+					continue;
+				ret = write_data_to_disk(fs_info, buf, cur,
+						sectorsize, bad_mirror);
+				if (ret < 0) {
+					error("failed to recover mirror for bytenr %llu mirror %d",
+						cur, bad_mirror);
+					goto out;
+				}
+			}
+		}
+	}
+out:
+	free(buf);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 07/16] btrfs-progs: scrub: Introduce function to scrub one mirror-based extent
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (5 preceding siblings ...)
  2018-01-05 11:01 ` [v6 06/16] btrfs-progs: scrub: Introduce functions to scrub mirror based data blocks Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 08/16] btrfs-progs: scrub: Introduce function to scrub one data stripe Gu Jinxiang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, scrub_one_extent(), as a wrapper to check one
mirror-based extent.

It will accept a btrfs_path parameter @path, which must point to a
META/EXTENT_ITEM.
And @start, @len, which must be a subset of META/EXTENT_ITEM.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 scrub.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 147 insertions(+), 1 deletion(-)

diff --git a/scrub.c b/scrub.c
index cee6fe14..b0a98b98 100644
--- a/scrub.c
+++ b/scrub.c
@@ -434,7 +434,7 @@ static int recover_data_mirror(struct btrfs_fs_info *fs_info,
 
 	num_copies = btrfs_num_copies(fs_info, start, len);
 	for (i = 0; i < num_copies; i++) {
-		for_each_set_bit(bit, corrupt_bitmaps[i], BITS_PER_LONG) {
+		for_each_set_bit(bit, corrupt_bitmaps[i], len / sectorsize) {
 			u64 cur = start + bit * sectorsize;
 			int good;
 
@@ -474,3 +474,149 @@ out:
 	free(buf);
 	return ret;
 }
+
+/* Btrfs only supports up to 2 copies of data, yet */
+#define BTRFS_MAX_COPIES	2
+
+/*
+ * Check all copies of range @start, @len.
+ * Caller must ensure the range is covered by EXTENT_ITEM/METADATA_ITEM
+ * specified by leaf of @path.
+ * And @start, @len must be a subset of the EXTENT_ITEM/METADATA_ITEM.
+ *
+ * Return 0 if the range is all OK or recovered or recoverable.
+ * Return <0 if the range can't be recoverable.
+ */
+static int scrub_one_extent(struct btrfs_fs_info *fs_info,
+			    struct btrfs_scrub_progress *scrub_ctx,
+			    struct btrfs_path *path, u64 start, u64 len,
+			    int write)
+{
+	struct btrfs_key key;
+	struct btrfs_extent_item *ei;
+	struct extent_buffer *leaf = path->nodes[0];
+	u32 sectorsize = fs_info->sectorsize;
+	unsigned long *corrupt_bitmaps[BTRFS_MAX_COPIES] = { NULL };
+	int slot = path->slots[0];
+	int num_copies;
+	int meta_corrupted = 0;
+	int meta_good_mirror = 0;
+	int data_bad_mirror = 0;
+	u64 extent_start;
+	u64 extent_len;
+	int metadata = 0;
+	int i;
+	int ret = 0;
+
+	btrfs_item_key_to_cpu(leaf, &key, slot);
+	if (key.type != BTRFS_METADATA_ITEM_KEY &&
+	    key.type != BTRFS_EXTENT_ITEM_KEY)
+		goto invalid_arg;
+
+	extent_start = key.objectid;
+	if (key.type == BTRFS_METADATA_ITEM_KEY) {
+		extent_len = fs_info->nodesize;
+		metadata = 1;
+	} else {
+		extent_len = key.offset;
+		ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
+		if (btrfs_extent_flags(leaf, ei) & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+			metadata = 1;
+	}
+	if (start >= extent_start + extent_len ||
+	    start + len <= extent_start)
+		goto invalid_arg;
+
+	for (i = 0; i < BTRFS_MAX_COPIES; i++) {
+		corrupt_bitmaps[i] = malloc(
+				calculate_bitmap_len(len / sectorsize));
+		if (!corrupt_bitmaps[i])
+			goto out;
+	}
+	num_copies = btrfs_num_copies(fs_info, start, len);
+	for (i = 1; i <= num_copies; i++) {
+		if (metadata) {
+			ret = check_tree_mirror(fs_info, scrub_ctx,
+					NULL, extent_start, i);
+			scrub_ctx->tree_extents_scrubbed++;
+			if (ret < 0)
+				meta_corrupted++;
+			else
+				meta_good_mirror = i;
+		} else {
+			ret = check_data_mirror(fs_info, scrub_ctx, NULL, start,
+						len, i, corrupt_bitmaps[i - 1]);
+			scrub_ctx->data_extents_scrubbed++;
+		}
+	}
+
+	/* Metadata recover and report */
+	if (metadata) {
+		if (!meta_corrupted) {
+			goto out;
+		} else if (meta_corrupted && meta_corrupted < num_copies) {
+			if (write) {
+				ret = recover_tree_mirror(fs_info, scrub_ctx,
+						start, meta_good_mirror);
+				if (ret < 0) {
+					error("failed to recover tree block at bytenr %llu",
+						start);
+					goto out;
+				}
+				printf("extent %llu len %llu REPAIRED: has corrupted mirror, repaired\n",
+					start, len);
+				goto out;
+			}
+			printf("extent %llu len %llu RECOVERABLE: has corrupted mirror, but is recoverable\n",
+				start, len);
+			goto out;
+		} else {
+			error("extent %llu len %llu CORRUPTED: all mirror(s) corrupted, can't be recovered",
+				start, len);
+			ret = -EIO;
+			goto out;
+		}
+	}
+	/* Data recover and report */
+	for (i = 0; i < num_copies; i++) {
+		if (find_first_bit(corrupt_bitmaps[i], len / sectorsize) >=
+		    len / sectorsize)
+			continue;
+		data_bad_mirror = i + 1;
+	}
+	/* All data sectors are good */
+	if (!data_bad_mirror) {
+		ret = 0;
+		goto out;
+	}
+
+	if (check_data_mirror_recoverable(fs_info, start, len,
+					  sectorsize, corrupt_bitmaps)) {
+		if (write) {
+			ret = recover_data_mirror(fs_info, scrub_ctx, start,
+						  len, corrupt_bitmaps);
+			if (ret < 0) {
+				error("failed to recover data extent at bytenr %llu len %llu",
+					start, len);
+				goto out;
+			}
+			printf("extent %llu len %llu REPARIED: has corrupted mirror, repaired\n",
+				start, len);
+			goto out;
+		}
+		printf("extent %llu len %llu RECOVERABLE: has corrupted mirror, recoverable\n",
+			start, len);
+		goto out;
+	}
+	error("extent %llu len %llu CORRUPTED, all mirror(s) corrupted, can't be repaired",
+		start, len);
+	ret = -EIO;
+out:
+	for (i = 0; i < BTRFS_MAX_COPIES; i++)
+		kfree(corrupt_bitmaps[i]);
+	return ret;
+
+invalid_arg:
+	error("invalid parameter for %s", __func__);
+	return -EINVAL;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 08/16] btrfs-progs: scrub: Introduce function to scrub one data stripe
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (6 preceding siblings ...)
  2018-01-05 11:01 ` [v6 07/16] btrfs-progs: scrub: Introduce function to scrub one mirror-based extent Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 09/16] btrfs-progs: scrub: Introduce function to verify parities Gu Jinxiang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce new function, scrub_one_data_stripe(), to check all data and
tree blocks inside the data stripe.

This function will not try to recovery any error, but only check if any
data/tree blocks has mismatch csum.

If data missing csum, which is completely valid for case like nodatasum,
it will just record it, but not report as error.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 scrub.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/scrub.c b/scrub.c
index b0a98b98..5c1c3957 100644
--- a/scrub.c
+++ b/scrub.c
@@ -620,3 +620,132 @@ invalid_arg:
 	error("invalid parameter for %s", __func__);
 	return -EINVAL;
 }
+
+/*
+ * Scrub one full data stripe of RAID5/6.
+ * This means it will check any data/metadata extent in the data stripe
+ * spcified by @stripe and @stripe_len
+ *
+ * This function will only *CHECK* if the data stripe has any corruption.
+ * Won't repair at this function.
+ *
+ * Return 0 if the full stripe is OK.
+ * Return <0 if any error is found.
+ * Note: Missing csum is not counted as error (NODATACSUM is valid)
+ */
+static int scrub_one_data_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 struct scrub_stripe *stripe, u32 stripe_len)
+{
+	struct btrfs_path *path;
+	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_key key;
+	u64 extent_start;
+	u64 extent_len;
+	u64 orig_csum_discards;
+	int ret;
+
+	if (!is_data_stripe(stripe))
+		return -EINVAL;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = stripe->logical + stripe_len;
+	key.offset = 0;
+	key.type = 0;
+
+	ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		struct btrfs_extent_item *ei;
+		struct extent_buffer *eb;
+		char *data;
+		int slot;
+		int metadata = 0;
+		u64 check_start;
+		u64 check_len;
+
+		ret = btrfs_previous_extent_item(extent_root, path, 0);
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+		if (ret < 0)
+			goto out;
+		eb = path->nodes[0];
+		slot = path->slots[0];
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		extent_start = key.objectid;
+		ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item);
+
+		/* tree block scrub */
+		if (key.type == BTRFS_METADATA_ITEM_KEY ||
+		    btrfs_extent_flags(eb, ei) & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			extent_len = extent_root->fs_info->nodesize;
+			metadata = 1;
+		} else {
+			extent_len = key.offset;
+			metadata = 0;
+		}
+
+		/* Current extent is out of our range, loop comes to end */
+		if (extent_start + extent_len <= stripe->logical)
+			break;
+
+		if (metadata) {
+			/*
+			 * Check crossing stripe first, which can't be scrubbed
+			 */
+			if (check_crossing_stripes(fs_info, extent_start,
+					extent_root->fs_info->nodesize)) {
+				error("tree block at %llu is crossing stripe boundary, unable to scrub",
+					extent_start);
+				ret = -EIO;
+				goto out;
+			}
+			data = stripe->data + extent_start - stripe->logical;
+			ret = check_tree_mirror(fs_info, scrub_ctx,
+						data, extent_start, 0);
+			/* Any csum/verify error means the stripe is screwed */
+			if (ret < 0) {
+				stripe->csum_mismatch = 1;
+				ret = -EIO;
+				goto out;
+			}
+			ret = 0;
+			continue;
+		}
+		/* Restrict the extent range to fit stripe range */
+		check_start = max(extent_start, stripe->logical);
+		check_len = min(extent_start + extent_len, stripe->logical +
+				stripe_len) - check_start;
+
+		/* Record original csum_discards to detect missing csum case */
+		orig_csum_discards = scrub_ctx->csum_discards;
+
+		data = stripe->data + check_start - stripe->logical;
+		ret = check_data_mirror(fs_info, scrub_ctx, data, check_start,
+					check_len, 0, NULL);
+		/* Csum mismatch, no need to continue anyway*/
+		if (ret < 0) {
+			stripe->csum_mismatch = 1;
+			goto out;
+		}
+		/* Check if there is any missing csum for data */
+		if (scrub_ctx->csum_discards != orig_csum_discards)
+			stripe->csum_missing = 1;
+		/*
+		 * Only increase data_extents_scrubbed if we are scrubbing the
+		 * tailing part of the data extent
+		 */
+		if (extent_start + extent_len <= stripe->logical + stripe_len)
+			scrub_ctx->data_extents_scrubbed++;
+		ret = 0;
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 09/16] btrfs-progs: scrub: Introduce function to verify parities
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (7 preceding siblings ...)
  2018-01-05 11:01 ` [v6 08/16] btrfs-progs: scrub: Introduce function to scrub one data stripe Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 10/16] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Gu Jinxiang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce new function, verify_parities(), to check whether parities match
with full stripe, whose data stripes match with their csum.

Caller should fill the scrub_full_stripe structure properly before
calling this function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 scrub.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/scrub.c b/scrub.c
index 5c1c3957..3db82656 100644
--- a/scrub.c
+++ b/scrub.c
@@ -19,6 +19,7 @@
 #include "disk-io.h"
 #include "utils.h"
 #include "kernel-lib/bitops.h"
+#include "kernel-lib/raid56.h"
 
 /*
  * For parity based profile (RAID56)
@@ -749,3 +750,71 @@ out:
 	btrfs_free_path(path);
 	return ret;
 }
+
+/*
+ * Verify parities for RAID56
+ * Caller must fill @fstripe before calling this function
+ *
+ * Return 0 for parities matches.
+ * Return >0 for P or Q mismatch
+ * Return <0 for fatal error
+ */
+static int verify_parities(struct btrfs_fs_info *fs_info,
+			   struct btrfs_scrub_progress *scrub_ctx,
+			   struct scrub_full_stripe *fstripe)
+{
+	void **ptrs;
+	void *ondisk_p = NULL;
+	void *ondisk_q = NULL;
+	void *buf_p;
+	void *buf_q;
+	int nr_stripes = fstripe->nr_stripes;
+	int stripe_len = BTRFS_STRIPE_LEN;
+	int i;
+	int ret = 0;
+
+	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
+	buf_p = malloc(fstripe->stripe_len);
+	buf_q = malloc(fstripe->stripe_len);
+	if (!ptrs || !buf_p || !buf_q) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < fstripe->nr_stripes; i++) {
+		struct scrub_stripe *stripe = &fstripe->stripes[i];
+
+		if (stripe->logical == BTRFS_RAID5_P_STRIPE) {
+			ondisk_p = stripe->data;
+			ptrs[i] = buf_p;
+			continue;
+		} else if (stripe->logical == BTRFS_RAID6_Q_STRIPE) {
+			ondisk_q = stripe->data;
+			ptrs[i] = buf_q;
+			continue;
+		} else {
+			ptrs[i] = stripe->data;
+			continue;
+		}
+	}
+	/* RAID6 */
+	if (ondisk_q) {
+		raid6_gen_syndrome(nr_stripes, stripe_len, ptrs);
+
+		if (memcmp(ondisk_q, ptrs[nr_stripes - 1], stripe_len) != 0 ||
+		    memcmp(ondisk_p, ptrs[nr_stripes - 2], stripe_len))
+			ret = 1;
+	} else {
+		ret = raid5_gen_result(nr_stripes, stripe_len, nr_stripes - 1,
+					ptrs);
+		if (ret < 0)
+			goto out;
+		if (memcmp(ondisk_p, ptrs[nr_stripes - 1], stripe_len) != 0)
+			ret = 1;
+	}
+out:
+	free(buf_p);
+	free(buf_q);
+	free(ptrs);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 10/16] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range.
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (8 preceding siblings ...)
  2018-01-05 11:01 ` [v6 09/16] btrfs-progs: scrub: Introduce function to verify parities Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 11/16] btrfs-progs: scrub: Introduce function to recover data parity Gu Jinxiang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, btrfs_check_extent_exists(), to check if there
is any extent in the range specified by user.

The parameter can be a large range, and if any extent exists in the
range, it will return >0 (in fact it will return 1).
Or return 0 if no extent is found.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 ctree.h       |  2 ++
 extent-tree.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/ctree.h b/ctree.h
index a7d26455..7d58cb33 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2521,6 +2521,8 @@ int exclude_super_stripes(struct btrfs_root *root,
 u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 		       struct btrfs_fs_info *info, u64 start, u64 end);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
+int btrfs_check_extent_exists(struct btrfs_fs_info *fs_info, u64 start,
+			      u64 len);
 
 /* ctree.c */
 int btrfs_comp_cpu_keys(struct btrfs_key *k1, struct btrfs_key *k2);
diff --git a/extent-tree.c b/extent-tree.c
index 055582c3..3af0c1f1 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -4256,3 +4256,63 @@ u64 add_new_free_space(struct btrfs_block_group_cache *block_group,
 
 	return total_added;
 }
+
+/*
+ * Check if there is any extent(both data and metadata) in the range
+ * [@start, @start + @len)
+ *
+ * Return 0 for no extent found.
+ * Return >0 for found extent.
+ * Return <0 for fatal error.
+ */
+int btrfs_check_extent_exists(struct btrfs_fs_info *fs_info, u64 start,
+			      u64 len)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	u64 extent_start;
+	u64 extent_len;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = start + len;
+	key.type = 0;
+	key.offset = 0;
+
+	ret = btrfs_search_slot(NULL, fs_info->extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	/*
+	 * Now we're pointing at slot whose key.object >= end, skip to previous
+	 * extent.
+	 */
+	ret = btrfs_previous_extent_item(fs_info->extent_root, path, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	extent_start = key.objectid;
+	if (key.type == BTRFS_METADATA_ITEM_KEY)
+		extent_len = fs_info->nodesize;
+	else
+		extent_len = key.offset;
+
+	/*
+	 * search_slot() and previous_extent_item() has ensured that our
+	 * extent_start < start + len, we only need to care extent end.
+	 */
+	if (extent_start + extent_len <= start)
+		ret = 0;
+	else
+		ret = 1;
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 11/16] btrfs-progs: scrub: Introduce function to recover data parity
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (9 preceding siblings ...)
  2018-01-05 11:01 ` [v6 10/16] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 12/16] btrfs-progs: scrub: Introduce helper to write a full stripe Gu Jinxiang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce function, recover_from_parities(), to recover data stripes.

It just wraps raid56_recov() with extra check functions to
scrub_full_stripe structure.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 scrub.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/scrub.c b/scrub.c
index 3db82656..d88c52e1 100644
--- a/scrub.c
+++ b/scrub.c
@@ -818,3 +818,54 @@ out:
 	free(ptrs);
 	return ret;
 }
+
+/*
+ * Try to recover data stripe from P or Q stripe
+ *
+ * Return >0 if it can't be require any more.
+ * Return 0 for successful repair or no need to repair at all
+ * Return <0 for fatal error
+ */
+static int recover_from_parities(struct btrfs_fs_info *fs_info,
+				  struct btrfs_scrub_progress *scrub_ctx,
+				  struct scrub_full_stripe *fstripe)
+{
+	void **ptrs;
+	int nr_stripes = fstripe->nr_stripes;
+	int stripe_len = BTRFS_STRIPE_LEN;
+	int max_tolerance;
+	int i;
+	int ret;
+
+	/* No need to recover */
+	if (!fstripe->nr_corrupted_stripes)
+		return 0;
+
+	/* Already recovered once, no more chance */
+	if (fstripe->recovered)
+		return 1;
+
+	if (fstripe->bg_type & BTRFS_BLOCK_GROUP_RAID5)
+		max_tolerance = 1;
+	else
+		max_tolerance = 2;
+
+	/* Out of repair */
+	if (fstripe->nr_corrupted_stripes > max_tolerance)
+		return 1;
+
+	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
+	if (!ptrs)
+		return -ENOMEM;
+
+	/* Construct ptrs */
+	for (i = 0; i < nr_stripes; i++)
+		ptrs[i] = fstripe->stripes[i].data;
+
+	ret = raid56_recov(nr_stripes, stripe_len, fstripe->bg_type,
+			fstripe->corrupted_index[0],
+			fstripe->corrupted_index[1], ptrs);
+	fstripe->recovered = 1;
+	free(ptrs);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 12/16] btrfs-progs: scrub: Introduce helper to write a full stripe
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (10 preceding siblings ...)
  2018-01-05 11:01 ` [v6 11/16] btrfs-progs: scrub: Introduce function to recover data parity Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 13/16] btrfs-progs: scrub: Introduce a function to scrub one " Gu Jinxiang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a internal helper, write_full_stripe() to calculate P/Q and
write the whole full stripe.

This is useful to recover RAID56 stripes.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 scrub.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/scrub.c b/scrub.c
index d88c52e1..83f02a95 100644
--- a/scrub.c
+++ b/scrub.c
@@ -869,3 +869,47 @@ static int recover_from_parities(struct btrfs_fs_info *fs_info,
 	free(ptrs);
 	return ret;
 }
+
+/*
+ * Helper to write a full stripe to disk
+ * P/Q will be re-calculated.
+ */
+static int write_full_stripe(struct scrub_full_stripe *fstripe)
+{
+	void **ptrs;
+	int nr_stripes = fstripe->nr_stripes;
+	int stripe_len = BTRFS_STRIPE_LEN;
+	int i;
+	int ret = 0;
+
+	ptrs = malloc(sizeof(void *) * fstripe->nr_stripes);
+	if (!ptrs)
+		return -ENOMEM;
+
+	for (i = 0; i < fstripe->nr_stripes; i++)
+		ptrs[i] = fstripe->stripes[i].data;
+
+	if (fstripe->bg_type & BTRFS_BLOCK_GROUP_RAID6) {
+		raid6_gen_syndrome(nr_stripes, stripe_len, ptrs);
+	} else {
+		ret = raid5_gen_result(nr_stripes, stripe_len, nr_stripes - 1,
+					ptrs);
+		if (ret < 0)
+			goto out;
+	}
+
+	for (i = 0; i < fstripe->nr_stripes; i++) {
+		struct scrub_stripe *stripe = &fstripe->stripes[i];
+
+		ret = pwrite(stripe->fd, stripe->data, fstripe->stripe_len,
+			     stripe->physical);
+		if (ret != fstripe->stripe_len) {
+			ret = -EIO;
+			goto out;
+		}
+	}
+out:
+	free(ptrs);
+	return ret;
+
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 13/16] btrfs-progs: scrub: Introduce a function to scrub one full stripe
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (11 preceding siblings ...)
  2018-01-05 11:01 ` [v6 12/16] btrfs-progs: scrub: Introduce helper to write a full stripe Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 14/16] btrfs-progs: scrub: Introduce function to check a whole block group Gu Jinxiang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, scrub_one_full_stripe(), to check a full
stripe.

It handles the full stripe scrub in the following steps:
0) Check if we need to check full stripe
   If full stripe contains no extent, why waste our CPU and IO?

1) Read out full stripe
   Then we know how many devices are missing or have read error.
   If out of repair, then exit

   If have missing device or have read error, try recover here.

2) Check data stripe against csum
   We add data stripe with csum error as corrupted stripe, just like
   dev missing or read error.
   Then recheck if csum mismatch is still below tolerance.

Finally we check the full stripe using 2 factors only:
A) If the full stripe go through recover ever
B) If the full stripe has csum error

Combine factor A and B we get:
1) A && B: Recovered, csum mismatch
   Screwed up totally
2) A && !B: Recovered, csum match
   Recoverable, data corrupted but P/Q is good to recover
3) !A && B: Not recovered, csum mismatch
   Try to recover corrupted data stripes
   If recovered csum match, then recoverable
   Else, screwed up
4) !A && !B: Not recovered, no csum mismatch
   Best case, just check if P/Q matches.
   If P/Q matches, everything is good
   Else, just P/Q is screwed up, still recoverable.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 scrub.c | 285 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 285 insertions(+)

diff --git a/scrub.c b/scrub.c
index 83f02a95..e474b18a 100644
--- a/scrub.c
+++ b/scrub.c
@@ -911,5 +911,290 @@ static int write_full_stripe(struct scrub_full_stripe *fstripe)
 out:
 	free(ptrs);
 	return ret;
+}
+
+/*
+ * Return 0 if we still have chance to recover
+ * Return <0 if we have no more chance
+ */
+static int report_recoverablity(struct scrub_full_stripe *fstripe)
+{
+	int max_tolerance;
+	u64 start = fstripe->logical_start;
+
+	if (fstripe->bg_type & BTRFS_BLOCK_GROUP_RAID5)
+		max_tolerance = 1;
+	else
+		max_tolerance = 2;
+
+	if (fstripe->nr_corrupted_stripes > max_tolerance) {
+		error(
+	"full stripe %llu CORRUPTED: too many read error or corrupted devices",
+			start);
+		error(
+	"full stripe %llu: tolerance: %d, missing: %d, read error: %d, csum error: %d",
+			start, max_tolerance, fstripe->err_read_stripes,
+			fstripe->err_missing_devs, fstripe->err_csum_dstripes);
+		return -EIO;
+	}
+	return 0;
+}
+
+static void clear_corrupted_stripe_record(struct scrub_full_stripe *fstripe)
+{
+	fstripe->corrupted_index[0] = -1;
+	fstripe->corrupted_index[1] = -1;
+	fstripe->nr_corrupted_stripes = 0;
+}
+
+static void record_corrupted_stripe(struct scrub_full_stripe *fstripe,
+				    int index)
+{
+	int i = 0;
+
+	for (i = 0; i < 2; i++) {
+		if (fstripe->corrupted_index[i] == -1) {
+			fstripe->corrupted_index[i] = index;
+			break;
+		}
+	}
+	fstripe->nr_corrupted_stripes++;
+}
+
+/*
+ * Scrub one full stripe.
+ *
+ * If everything matches, that's good.
+ * If data stripe corrupted badly, no mean to recovery, it will report it.
+ * If data stripe corrupted, try recovery first and recheck csum, to
+ * determine if it's recoverable or screwed up.
+ */
+static int scrub_one_full_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 u64 start, u64 *next_ret, int write)
+{
+	struct scrub_full_stripe *fstripe;
+	struct btrfs_map_block *map_block = NULL;
+	u32 stripe_len = BTRFS_STRIPE_LEN;
+	u64 bg_type;
+	u64 len;
+	int i;
+	int ret;
+
+	if (!next_ret) {
+		error("invalid argument for %s", __func__);
+		return -EINVAL;
+	}
+
+	ret = __btrfs_map_block_v2(fs_info, WRITE, start, stripe_len,
+				   &map_block);
+	if (ret < 0) {
+		/* Let caller to skip the whole block group */
+		*next_ret = (u64)-1;
+		return ret;
+	}
+	start = map_block->start;
+	len = map_block->length;
+	*next_ret = start + len;
+
+	/*
+	 * Step 0: Check if we need to scrub the full stripe
+	 *
+	 * If no extent lies in the full stripe, not need to check
+	 */
+	ret = btrfs_check_extent_exists(fs_info, start, len);
+	if (ret < 0) {
+		free(map_block);
+		return ret;
+	}
+	/* No extents in range, no need to check */
+	if (ret == 0) {
+		free(map_block);
+		return 0;
+	}
+
+	bg_type = map_block->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+	if (bg_type != BTRFS_BLOCK_GROUP_RAID5 &&
+	    bg_type != BTRFS_BLOCK_GROUP_RAID6) {
+		free(map_block);
+		return -EINVAL;
+	}
+
+	fstripe = alloc_full_stripe(map_block->num_stripes,
+				    map_block->stripe_len);
+	if (!fstripe)
+		return -ENOMEM;
+
+	fstripe->logical_start = map_block->start;
+	fstripe->nr_stripes = map_block->num_stripes;
+	fstripe->stripe_len = stripe_len;
+	fstripe->bg_type = bg_type;
+
+	/*
+	 * Step 1: Read out the whole full stripe
+	 *
+	 * Then we have the chance to exit early if too many devices are
+	 * missing.
+	 */
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct scrub_stripe *s_stripe = &fstripe->stripes[i];
+		struct btrfs_map_stripe *m_stripe = &map_block->stripes[i];
+
+		s_stripe->logical = m_stripe->logical;
+		s_stripe->fd = m_stripe->dev->fd;
+		s_stripe->physical = m_stripe->physical;
+
+		if (m_stripe->dev->fd == -1) {
+			s_stripe->dev_missing = 1;
+			record_corrupted_stripe(fstripe, i);
+			fstripe->err_missing_devs++;
+			continue;
+		}
+
+		ret = pread(m_stripe->dev->fd, s_stripe->data, stripe_len,
+			    m_stripe->physical);
+		if (ret < stripe_len) {
+			record_corrupted_stripe(fstripe, i);
+			fstripe->err_read_stripes++;
+			continue;
+		}
+	}
+
+	ret = report_recoverablity(fstripe);
+	if (ret < 0)
+		goto out;
+
+	ret = recover_from_parities(fs_info, scrub_ctx, fstripe);
+	if (ret < 0) {
+		error("full stripe %llu CORRUPTED: failed to recover: %s\n",
+		      fstripe->logical_start, strerror(-ret));
+		goto out;
+	}
+
+	/*
+	 * Clear corrupted stripes report, since they are recovered,
+	 * and later checker need to record csum mismatch stripes reusing
+	 * these members
+	 */
+	clear_corrupted_stripe_record(fstripe);
+
+	/*
+	 * Step 2: Check each data stripes against csum
+	 */
+	for (i = 0; i < map_block->num_stripes; i++) {
+		struct scrub_stripe *stripe = &fstripe->stripes[i];
+
+		if (!is_data_stripe(stripe))
+			continue;
+		ret = scrub_one_data_stripe(fs_info, scrub_ctx, stripe,
+					    stripe_len);
+		if (ret < 0) {
+			fstripe->err_csum_dstripes++;
+			record_corrupted_stripe(fstripe, i);
+		}
+	}
+
+	ret = report_recoverablity(fstripe);
+	if (ret < 0)
+		goto out;
+
+	/*
+	 * Recovered before, but no csum error
+	 */
+	if (fstripe->err_csum_dstripes == 0 && fstripe->recovered) {
+		error(
+		"full stripe %llu RECOVERABLE: P/Q is good for recovery",
+			start);
+		ret = 0;
+		goto out;
+	}
+	/*
+	 * No csum error, not recovered before.
+	 *
+	 * Only need to check if P/Q matches.
+	 */
+	if (fstripe->err_csum_dstripes == 0 && !fstripe->recovered) {
+		ret = verify_parities(fs_info, scrub_ctx, fstripe);
+		if (ret < 0) {
+			error(
+		"full stripe %llu CORRUPTED: failed to check P/Q: %s",
+				start, strerror(-ret));
+			goto out;
+		}
+		if (ret > 0) {
+			if (write) {
+				ret = write_full_stripe(fstripe);
+				if (ret < 0)
+					error("failed to write full stripe %llu: %s",
+						start, strerror(-ret));
+				else
+					printf("full stripe %llu REPARIED: only P/Q mismatches, repaired\n",
+						start);
+				goto out;
+			} else {
+				printf("full stripe %llu RECOVERABLE: only P/Q is corrupted\n",
+					start);
+				ret = 0;
+			}
+		}
+		goto out;
+	}
 
+	/*
+	 * Still csum error after recovery
+	 *
+	 * No mean to fix further, screwed up already.
+	 */
+	if (fstripe->err_csum_dstripes && fstripe->recovered) {
+		error(
+	"full stripe %llu CORRUPTED: csum still mismatch after recovery",
+			start);
+		ret = -EIO;
+		goto out;
+	}
+
+	/* Csum mismatch, but we still has chance to recover. */
+	ret = recover_from_parities(fs_info, scrub_ctx, fstripe);
+	if (ret < 0) {
+		error(
+	"full stripe %llu CORRUPTED: failed to recover: %s\n",
+			fstripe->logical_start, strerror(-ret));
+		goto out;
+	}
+
+	/* After recovery, recheck data stripe csum */
+	for (i = 0; i < 2; i++) {
+		int index = fstripe->corrupted_index[i];
+		struct scrub_stripe *stripe;
+
+		if (i == -1)
+			continue;
+		stripe = &fstripe->stripes[index];
+		ret = scrub_one_data_stripe(fs_info, scrub_ctx, stripe,
+					    stripe_len);
+		if (ret < 0) {
+			error(
+	"full stripe %llu CORRUPTED: csum still mismatch after recovery",
+				start);
+			goto out;
+		}
+	}
+	if (write) {
+		ret = write_full_stripe(fstripe);
+		if (ret < 0)
+			error("failed to write full stripe %llu: %s",
+				start, strerror(-ret));
+		else
+			printf("full stripe %llu REPARIED: corrupted data with good P/Q, repaired\n",
+				start);
+		goto out;
+	}
+	printf(
+	"full stripe %llu RECOVERABLE: Data stripes corrupted, but P/Q is good\n",
+		start);
+
+out:
+	free_full_stripe(fstripe);
+	free(map_block);
+	return ret;
 }
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 14/16] btrfs-progs: scrub: Introduce function to check a whole block group
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (12 preceding siblings ...)
  2018-01-05 11:01 ` [v6 13/16] btrfs-progs: scrub: Introduce a function to scrub one " Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 15/16] btrfs-progs: scrub: Introduce offline scrub function Gu Jinxiang
  2018-01-05 11:01 ` [v6 16/16] btrfs-progs: add test for offline-scrub Gu Jinxiang
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce new function, scrub_one_block_group(), to scrub a block group.

For Single/DUP/RAID0/RAID1/RAID10, we use old mirror number based
map_block, and check extent by extent.

For parity based profile (RAID5/6), we use new map_block_v2() and check
full stripe by full stripe.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 scrub.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/scrub.c b/scrub.c
index e474b18a..1f2fd56d 100644
--- a/scrub.c
+++ b/scrub.c
@@ -1198,3 +1198,95 @@ out:
 	free(map_block);
 	return ret;
 }
+
+/*
+ * Scrub one block group.
+ *
+ * This function will handle all profiles current btrfs supports.
+ * Return 0 for scrubbing the block group. Found error will be recorded into
+ * scrub_ctx.
+ * Return <0 for fatal error preventing scrubing the block group.
+ */
+static int scrub_one_block_group(struct btrfs_fs_info *fs_info,
+				 struct btrfs_scrub_progress *scrub_ctx,
+				 struct btrfs_block_group_cache *bg_cache,
+				 int write)
+{
+	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	u64 bg_start = bg_cache->key.objectid;
+	u64 bg_len = bg_cache->key.offset;
+	int ret;
+
+	if (bg_cache->flags &
+	    (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+		u64 cur = bg_start;
+		u64 next;
+
+		while (cur < bg_start + bg_len) {
+			ret = scrub_one_full_stripe(fs_info, scrub_ctx, cur,
+						    &next, write);
+			/* Ignore any non-fatal error */
+			if (ret < 0 && ret != -EIO) {
+				error("fatal error happens checking one full stripe at bytenr: %llu: %s",
+					cur, strerror(-ret));
+				return ret;
+			}
+			cur = next;
+		}
+		/* Ignore any -EIO error, such error will be reported at last */
+		return 0;
+	}
+	/* None parity based profile, check extent by extent */
+	key.objectid = bg_start;
+	key.type = 0;
+	key.offset = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+	ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		struct extent_buffer *eb = path->nodes[0];
+		int slot = path->slots[0];
+		u64 extent_start;
+		u64 extent_len;
+
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		if (key.objectid >= bg_start + bg_len)
+			break;
+		if (key.type != BTRFS_EXTENT_ITEM_KEY &&
+		    key.type != BTRFS_METADATA_ITEM_KEY)
+			goto next;
+
+		extent_start = key.objectid;
+		if (key.type == BTRFS_METADATA_ITEM_KEY)
+			extent_len = extent_root->fs_info->nodesize;
+		else
+			extent_len = key.offset;
+
+		ret = scrub_one_extent(fs_info, scrub_ctx, path, extent_start,
+					extent_len, write);
+		if (ret < 0 && ret != -EIO) {
+			error("fatal error checking extent bytenr %llu len %llu: %s",
+				extent_start, extent_len, strerror(-ret));
+			goto out;
+		}
+		ret = 0;
+next:
+		ret = btrfs_next_extent_item(extent_root, path, bg_start +
+					     bg_len);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 15/16] btrfs-progs: scrub: Introduce offline scrub function
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (13 preceding siblings ...)
  2018-01-05 11:01 ` [v6 14/16] btrfs-progs: scrub: Introduce function to check a whole block group Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  2018-01-05 11:01 ` [v6 16/16] btrfs-progs: add test for offline-scrub Gu Jinxiang
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo, Su

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Now, btrfs-progs has a kernel scrub equivalent.
A new option, --offline is added to "btrfs scrub start".

If --offline is given, btrfs scrub will just act like kernel scrub, to
check every copy of extent and do a report on corrupted data and if it's
recoverable.

The advantage compare to kernel scrub is:
1) No race
   Unlike kernel scrub, which is done in parallel, offline scrub is done
   by a single thread.
   Although it may be slower than kernel one, it's safer and no false
   alert.

2) Correctness
   Kernel has a known bug (fix submitted) which will recovery RAID5/6
   data but screw up P/Q, due to the hardness coding in kernel.
   While in btrfs-progs, no page, (almost) no memory size limit, we're
   can focus on the scrub, and make things easier.

New offline scrub can detect and report P/Q corruption with
recoverability report, while kernel will only report data stripe error.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Su <suy.fnst@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 Documentation/btrfs-scrub.asciidoc |   9 +++
 cmds-scrub.c                       | 116 +++++++++++++++++++++++++++++++++++--
 ctree.h                            |   6 ++
 scrub.c                            |  71 +++++++++++++++++++++++
 utils.h                            |   6 ++
 5 files changed, 204 insertions(+), 4 deletions(-)

diff --git a/Documentation/btrfs-scrub.asciidoc b/Documentation/btrfs-scrub.asciidoc
index eb90a1c4..49527c2a 100644
--- a/Documentation/btrfs-scrub.asciidoc
+++ b/Documentation/btrfs-scrub.asciidoc
@@ -78,6 +78,15 @@ set IO priority classdata (see `ionice`(1) manpage)
 force starting new scrub even if a scrub is already running,
 this can useful when scrub status file is damaged and reports a running
 scrub although it is not, but should not normally be necessary
+--offline::::
+Do offline scrub.
+NOTE: it's experimental and repair is not supported yet.
+--progress::::
+Show progress status while doing offline scrub. (Default)
+NOTE: it's only useful with option --offline.
+--no-progress::::
+Don't show progress status while doing offline scrub.
+NOTE: it's only useful with option --offline.
 
 *status* [-d] <path>|<device>::
 Show status of a running scrub for the filesystem identified by 'path' or
diff --git a/cmds-scrub.c b/cmds-scrub.c
index 5388fdcf..063b4dfd 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -36,12 +36,14 @@
 #include <signal.h>
 #include <stdarg.h>
 #include <limits.h>
+#include <getopt.h>
 
 #include "ctree.h"
 #include "ioctl.h"
 #include "utils.h"
 #include "volumes.h"
 #include "disk-io.h"
+#include "task-utils.h"
 
 #include "commands.h"
 #include "help.h"
@@ -217,6 +219,32 @@ static void add_to_fs_stat(struct btrfs_scrub_progress *p,
 	_SCRUB_FS_STAT_MIN(ss, finished, fs_stat);
 }
 
+static void *print_offline_status(void *p)
+{
+	struct task_context *ctx = p;
+	const char work_indicator[] = {'.', 'o', 'O', 'o' };
+	uint32_t count = 0;
+
+	task_period_start(ctx->info, 1000 /* 1s */);
+
+	while (1) {
+		printf("Doing offline scrub [%c] [%llu/%llu]\r",
+		       work_indicator[count % 4], ctx->cur, ctx->all);
+		count++;
+		fflush(stdout);
+		task_period_wait(ctx->info);
+	}
+	return NULL;
+}
+
+static int print_offline_return(void *p)
+{
+	printf("\n");
+	fflush(stdout);
+
+	return 0;
+}
+
 static void init_fs_stat(struct scrub_fs_stat *fs_stat)
 {
 	memset(fs_stat, 0, sizeof(*fs_stat));
@@ -1100,7 +1128,7 @@ static const char * const cmd_scrub_resume_usage[];
 
 static int scrub_start(int argc, char **argv, int resume)
 {
-	int fdmnt;
+	int fdmnt = -1;
 	int prg_fd = -1;
 	int fdres = -1;
 	int ret;
@@ -1124,10 +1152,14 @@ static int scrub_start(int argc, char **argv, int resume)
 	int n_start = 0;
 	int n_skip = 0;
 	int n_resume = 0;
+	int offline = 0;
+	int progress_set = -1;
 	struct btrfs_ioctl_fs_info_args fi_args;
 	struct btrfs_ioctl_dev_info_args *di_args = NULL;
 	struct scrub_progress *sp = NULL;
 	struct scrub_fs_stat fs_stat;
+	struct task_context task = {0};
+	struct btrfs_fs_info *fs_info = NULL;
 	struct timeval tv;
 	struct sockaddr_un addr = {
 		.sun_family = AF_UNIX,
@@ -1147,7 +1179,18 @@ static int scrub_start(int argc, char **argv, int resume)
 	int force = 0;
 	int nothing_to_resume = 0;
 
-	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
+	enum { GETOPT_VAL_OFFLINE = 257,
+	       GETOPT_VAL_PROGRESS,
+	       GETOPT_VAL_NO_PROGRESS};
+	static const struct option long_options[] = {
+		{ "offline", no_argument, NULL, GETOPT_VAL_OFFLINE},
+		{ "progress", no_argument, NULL, GETOPT_VAL_PROGRESS},
+		{ "no-progress", no_argument, NULL, GETOPT_VAL_NO_PROGRESS},
+		{ NULL, 0, NULL, 0}
+	};
+
+	while ((c = getopt_long(argc, argv, "BdqrRc:n:f", long_options,
+				NULL)) != -1) {
 		switch (c) {
 		case 'B':
 			do_background = 0;
@@ -1175,6 +1218,15 @@ static int scrub_start(int argc, char **argv, int resume)
 		case 'f':
 			force = 1;
 			break;
+		case GETOPT_VAL_OFFLINE:
+			offline = 1;
+			break;
+		case GETOPT_VAL_PROGRESS:
+			progress_set = 1;
+			break;
+		case GETOPT_VAL_NO_PROGRESS:
+			progress_set = 0;
+			break;
 		case '?':
 		default:
 			usage(resume ? cmd_scrub_resume_usage :
@@ -1189,6 +1241,53 @@ static int scrub_start(int argc, char **argv, int resume)
 					cmd_scrub_start_usage);
 	}
 
+	if (progress_set != -1 && !offline)
+		warning("Option --no-progress and --progress only works for --offline, ignored.");
+
+	if (offline) {
+		unsigned ctree_flags = OPEN_CTREE_EXCLUSIVE;
+
+		ret = check_mounted(argv[optind]);
+		if (ret < 0) {
+			error("could not check mount status: %s", strerror(-ret));
+			err |= !!ret;
+			goto out;
+		} else if (ret) {
+			error("%s is currently mounted, aborting", argv[optind]);
+			ret = -EBUSY;
+			err |= !!ret;
+			goto out;
+		}
+
+		if (!do_background || do_wait || do_print ||
+		    do_stats_per_dev || do_quiet || print_raw ||
+		    ioprio_class != IOPRIO_CLASS_IDLE || ioprio_classdata ||
+		    force)
+			warning("Offline scrub doesn't support extra options other than -r");
+
+		if (!readonly)
+			ctree_flags |= OPEN_CTREE_WRITES;
+		fs_info = open_ctree_fs_info(argv[optind], 0, 0, 0, ctree_flags);
+		if (!fs_info) {
+			error("cannot open file system");
+			ret = -EIO;
+			err = 1;
+			goto out;
+		}
+
+		if (progress_set == 1) {
+			task.info = task_init(print_offline_status,
+					      print_offline_return, &task);
+			ret = btrfs_scrub(fs_info, &task, !readonly);
+			task_deinit(task.info);
+		} else {
+			ret = btrfs_scrub(fs_info, NULL, !readonly);
+		}
+
+		goto out;
+	}
+
+
 	spc.progress = NULL;
 	if (do_quiet && do_print)
 		do_print = 0;
@@ -1545,7 +1644,10 @@ out:
 		if (sock_path[0])
 			unlink(sock_path);
 	}
-	close_file_or_dir(fdmnt, dirstream);
+	if (fdmnt >= 0)
+		close_file_or_dir(fdmnt, dirstream);
+	if (fs_info)
+		close_ctree_fs_info(fs_info);
 
 	if (err)
 		return 1;
@@ -1563,9 +1665,10 @@ out:
 }
 
 static const char * const cmd_scrub_start_usage[] = {
-	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] <path>|<device>",
+	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] [--offline] [--progress][no-progress] <path>|<device>",
 	"Start a new scrub. If a scrub is already running, the new one fails.",
 	"",
+	"Online (kernel) scrub options:",
 	"-B     do not background",
 	"-d     stats per device (-B only)",
 	"-q     be quiet",
@@ -1575,6 +1678,11 @@ static const char * const cmd_scrub_start_usage[] = {
 	"-n     set ioprio classdata (see ionice(1) manpage)",
 	"-f     force starting new scrub even if a scrub is already running",
 	"       this is useful when scrub stats record file is damaged",
+	"",
+	"Offline scrub options:",
+	"--offline     start an offline scrub, not support other options",
+	"--progress    show progress status (default), only work with option --offline",
+	"--no-progress do not show progress status, only work only with option --offline",
 	NULL
 };
 
diff --git a/ctree.h b/ctree.h
index 7d58cb33..f30bc6b3 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2767,4 +2767,10 @@ int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 int btrfs_read_data_csums(struct btrfs_fs_info *fs_info, u64 start, u64 len,
 			  void *csum_ret, unsigned long *bitmap_ret);
 
+
+/* scrub.c */
+struct task_context;
+int btrfs_scrub(struct btrfs_fs_info *fs_info, struct task_context *ctx,
+		int write);
+
 #endif
diff --git a/scrub.c b/scrub.c
index 1f2fd56d..cde9e882 100644
--- a/scrub.c
+++ b/scrub.c
@@ -19,6 +19,7 @@
 #include "disk-io.h"
 #include "utils.h"
 #include "kernel-lib/bitops.h"
+#include "task-utils.h"
 #include "kernel-lib/raid56.h"
 
 /*
@@ -1290,3 +1291,73 @@ out:
 	btrfs_free_path(path);
 	return ret;
 }
+
+int btrfs_scrub(struct btrfs_fs_info *fs_info, struct task_context *task,
+		int write)
+{
+	u64 bg_nr = 0;
+	struct btrfs_block_group_cache *bg_cache;
+	struct btrfs_scrub_progress scrub_ctx = {0};
+	int ret = 0;
+
+	ASSERT(fs_info);
+
+	bg_cache = btrfs_lookup_first_block_group(fs_info, 0);
+	if (!bg_cache) {
+		error("no block group is found");
+		return -ENOENT;
+	}
+	++bg_nr;
+
+	if (task) {
+		/* get block group numbers for progress */
+		while (1) {
+			u64 bg_offset = bg_cache->key.objectid +
+				bg_cache->key.offset;
+			bg_cache = btrfs_lookup_first_block_group(fs_info,
+								  bg_offset);
+			if (!bg_cache)
+				break;
+			++bg_nr;
+		}
+		task->all = bg_nr;
+		task->cur = 1;
+		task_start(task->info);
+
+		bg_cache = btrfs_lookup_first_block_group(fs_info, 0);
+	}
+
+	while (1) {
+		ret = scrub_one_block_group(fs_info, &scrub_ctx, bg_cache,
+					    write);
+		if (ret < 0 && ret != -EIO)
+			break;
+		if (task)
+			task->cur++;
+
+		bg_cache = btrfs_lookup_first_block_group(fs_info,
+				bg_cache->key.objectid + bg_cache->key.offset);
+		if (!bg_cache)
+			break;
+	}
+
+	if (task)
+		task_stop(task->info);
+
+	printf("Scrub result:\n");
+	printf("Tree bytes scrubbed: %llu\n", scrub_ctx.tree_bytes_scrubbed);
+	printf("Tree extents scrubbed: %llu\n", scrub_ctx.tree_extents_scrubbed);
+	printf("Data bytes scrubbed: %llu\n", scrub_ctx.data_bytes_scrubbed);
+	printf("Data extents scrubbed: %llu\n", scrub_ctx.data_extents_scrubbed);
+	printf("Data bytes without csum: %llu\n", scrub_ctx.csum_discards *
+			fs_info->sectorsize);
+	printf("Read error: %llu\n", scrub_ctx.read_errors);
+	printf("Verify error: %llu\n", scrub_ctx.verify_errors);
+	printf("Csum error: %llu\n", scrub_ctx.csum_errors);
+	if (scrub_ctx.csum_errors || scrub_ctx.read_errors ||
+	    scrub_ctx.uncorrectable_errors || scrub_ctx.verify_errors)
+		ret = 1;
+	else
+		ret = 0;
+	return ret;
+}
diff --git a/utils.h b/utils.h
index 5d869a50..17698f38 100644
--- a/utils.h
+++ b/utils.h
@@ -195,4 +195,10 @@ u64 rand_u64(void);
 unsigned int rand_range(unsigned int upper);
 void init_rand_seed(u64 seed);
 
+struct task_context {
+	u64 cur;
+	u64 all;
+	struct task_info *info;
+};
+
 #endif
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [v6 16/16] btrfs-progs: add test for offline-scrub
  2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
                   ` (14 preceding siblings ...)
  2018-01-05 11:01 ` [v6 15/16] btrfs-progs: scrub: Introduce offline scrub function Gu Jinxiang
@ 2018-01-05 11:01 ` Gu Jinxiang
  15 siblings, 0 replies; 17+ messages in thread
From: Gu Jinxiang @ 2018-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs

Add a test for offline-scrub.
The process of this test case:
1)create a filesystem with profile raid10
2)mount the filesystem, create a file in the mount point, and write
some data to the file
3)get the logical address of the file's extent data
4)get the physical address of the logical address
5)overwrite the contents in the physical address
6)use offline scrub to check and repair it

Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 Makefile                                           |  6 ++-
 tests/scrub-tests.sh                               | 43 +++++++++++++++++++
 tests/scrub-tests/001-offline-scrub-raid10/test.sh | 50 ++++++++++++++++++++++
 3 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100755 tests/scrub-tests.sh
 create mode 100755 tests/scrub-tests/001-offline-scrub-raid10/test.sh

diff --git a/Makefile b/Makefile
index fa3ebc86..0a3060f5 100644
--- a/Makefile
+++ b/Makefile
@@ -322,6 +322,10 @@ test-cli: btrfs
 	@echo "    [TEST]   cli-tests.sh"
 	$(Q)bash tests/cli-tests.sh
 
+test-scrub: btrfs mkfs.btrfs
+	@echo "    [TEST]   scrub-tests.sh"
+	$(Q)bash tests/scrub-tests.sh
+
 test-clean:
 	@echo "Cleaning tests"
 	$(Q)bash tests/clean-tests.sh
@@ -332,7 +336,7 @@ test-inst: all
 		$(MAKE) $(MAKEOPTS) DESTDIR=$$tmpdest install && \
 		$(RM) -rf -- $$tmpdest
 
-test: test-fsck test-mkfs test-convert test-misc test-fuzz test-cli
+test: test-fsck test-mkfs test-convert test-misc test-fuzz test-cli test-scrub
 
 #
 # NOTE: For static compiles, you need to have all the required libs
diff --git a/tests/scrub-tests.sh b/tests/scrub-tests.sh
new file mode 100755
index 00000000..697137f4
--- /dev/null
+++ b/tests/scrub-tests.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+#
+# btrfs scrub tests
+
+LANG=C
+SCRIPT_DIR=$(dirname $(readlink -f "$0"))
+TOP=$(readlink -f "$SCRIPT_DIR/../")
+TEST_DEV=${TEST_DEV:-}
+RESULTS="$TOP/tests/scrub-tests-results.txt"
+IMAGE="$TOP/tests/test.img"
+
+source "$TOP/tests/common"
+
+export TOP
+export RESULTS
+export LANG
+export IMAGE
+export TEST_DEV
+
+rm -f "$RESULTS"
+
+check_prereq btrfs
+check_kernel_support
+
+# The tests are driven by their custom script called 'test.sh'
+
+for i in $(find "$TOP/tests/scrub-tests" -maxdepth 1 -mindepth 1 -type d	\
+	${TEST:+-name "$TEST"} | sort)
+do
+	echo "    [TEST/scrub]   $(basename $i)"
+	cd "$i"
+	echo "=== Entering $i" >> "$RESULTS"
+	if [ -x test.sh ]; then
+		./test.sh
+		if [ $? -ne 0 ]; then
+			if [[ $TEST_LOG =~ dump ]]; then
+				cat "$RESULTS"
+			fi
+			_fail "test failed for case $(basename $i)"
+		fi
+	fi
+	cd "$TOP"
+done
diff --git a/tests/scrub-tests/001-offline-scrub-raid10/test.sh b/tests/scrub-tests/001-offline-scrub-raid10/test.sh
new file mode 100755
index 00000000..c609d870
--- /dev/null
+++ b/tests/scrub-tests/001-offline-scrub-raid10/test.sh
@@ -0,0 +1,50 @@
+#!/bin/bash
+
+source $TOP/tests/common
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+check_prereq btrfs-debug-tree
+check_prereq btrfs-map-logical
+
+setup_root_helper
+
+setup_loopdevs 4
+prepare_loopdevs
+
+dev1=${loopdevs[1]}
+file=$TEST_MNT/file
+
+mkfs_multi()
+{
+	run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $@ ${loopdevs[@]}
+}
+
+#create filesystem
+mkfs_multi -d raid10 -m raid10
+run_check $SUDO_HELPER mount -t btrfs $dev1 "$TEST_MNT"
+
+#write some data
+run_check $SUDO_HELPER touch $file
+run_check $SUDO_HELPER dd if=/dev/zero of=$file bs=64K count=1
+run_check sync -f $file
+
+#get the extent data's logical address of $file
+logical=$($SUDO_HELPER $TOP/btrfs-debug-tree -t 5 $dev1 | grep -oP '(?<=byte\s)\d+')
+
+#get the first physical address and device of $file's data
+read physical dev< <($SUDO_HELPER $TOP/btrfs-map-logical -l $logical $dev1| head -1 |cut -d ' ' -f6,8)
+
+#then modify the data
+run_check $SUDO_HELPER dd if=/dev/random of=$dev seek=$(($physical/65536)) bs=64K count=1
+run_check sync -f $file
+
+run_check $SUDO_HELPER umount "$TEST_MNT"
+log=$(run_check_stdout $SUDO_HELPER $TOP/btrfs scrub start --offline $dev1)
+cleanup_loopdevs
+
+#check result
+result=$(echo $log | grep 'len 65536 REPARIED: has corrupted mirror, repaired')
+if [[ -z "$result" ]] ;then
+	_fail "scrub repair faild"
+fi
-- 
2.14.3




^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-01-05 11:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-05 11:01 [v6 00/16] Btrfs-progs offline scrub Gu Jinxiang
2018-01-05 11:01 ` [v6 01/16] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Gu Jinxiang
2018-01-05 11:01 ` [v6 02/16] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Gu Jinxiang
2018-01-05 11:01 ` [v6 03/16] btrfs-progs: csum: Introduce function to read out data csums Gu Jinxiang
2018-01-05 11:01 ` [v6 04/16] btrfs-progs: scrub: Introduce structures to support offline scrub for RAID56 Gu Jinxiang
2018-01-05 11:01 ` [v6 05/16] btrfs-progs: scrub: Introduce functions to scrub mirror based tree block Gu Jinxiang
2018-01-05 11:01 ` [v6 06/16] btrfs-progs: scrub: Introduce functions to scrub mirror based data blocks Gu Jinxiang
2018-01-05 11:01 ` [v6 07/16] btrfs-progs: scrub: Introduce function to scrub one mirror-based extent Gu Jinxiang
2018-01-05 11:01 ` [v6 08/16] btrfs-progs: scrub: Introduce function to scrub one data stripe Gu Jinxiang
2018-01-05 11:01 ` [v6 09/16] btrfs-progs: scrub: Introduce function to verify parities Gu Jinxiang
2018-01-05 11:01 ` [v6 10/16] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Gu Jinxiang
2018-01-05 11:01 ` [v6 11/16] btrfs-progs: scrub: Introduce function to recover data parity Gu Jinxiang
2018-01-05 11:01 ` [v6 12/16] btrfs-progs: scrub: Introduce helper to write a full stripe Gu Jinxiang
2018-01-05 11:01 ` [v6 13/16] btrfs-progs: scrub: Introduce a function to scrub one " Gu Jinxiang
2018-01-05 11:01 ` [v6 14/16] btrfs-progs: scrub: Introduce function to check a whole block group Gu Jinxiang
2018-01-05 11:01 ` [v6 15/16] btrfs-progs: scrub: Introduce offline scrub function Gu Jinxiang
2018-01-05 11:01 ` [v6 16/16] btrfs-progs: add test for offline-scrub Gu Jinxiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).