[PATCH] btrfs-progs: wipe all copies of the stale superblock

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] btrfs-progs: wipe all copies of the stale superblock
@ 2018-03-21 10:19 Anand Jain
  2018-03-21 12:53 ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2018-03-21 10:19 UTC (permalink / raw)
  To: linux-btrfs

Recovering from the other copies of the superblock is fundamental
to BTRFS, which provides resilient against single LBA failure in
the DUP group profile.

Further, in the test case [1] it shows a good but stale superblock
at copy#2. This will lead to confusion during auto/manual recovery.
So strictly speaking if a device has three copies of the superblock
and if we have permission to wipe it (-f option) then we have to wipe
all the copies of the superblock.

If there is any objection to writing beyond mkfs.btrfs -b <size>, then
we could fail the mkfs/dev-add/dev-replace operation and ask the user
the wipe using dd command manually, as anyway there is no use of keeping
only the copy#2 when the new FS has been written.

Test case: Note that copy#2 fsid is different from primary and copy#1
fsid.

mkfs.btrfs -qf /dev/mapper/vg-lv && \
mkfs.btrfs -qf -b1G /dev/mapper/vg-lv && \
btrfs in dump-super -a /dev/mapper/vg-lv | grep '.fsid|superblock:'

superblock: bytenr=65536, device=/dev/mapper/vg-lv
dev_item.fsid		ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
superblock: bytenr=67108864, device=/dev/mapper/vg-lv
dev_item.fsid		ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
superblock: bytenr=274877906944, device=/dev/mapper/vg-lv
dev_item.fsid		b97a9206-593b-4933-a424-c6a6ee23fe7c [match]

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 Hope with this we can patch the kernel to auto recover from the
 failed primary SB. In the earlier discussion on that, I think we
 are scrutinizing the wrong side (kernel) of the problem.
 Also, we need to fail the mount if all the copies of the SB do
 not have the same fsid.

 utils.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/utils.c b/utils.c
index 00020e1d6bdf..9c027f77d9c1 100644
--- a/utils.c
+++ b/utils.c
@@ -365,6 +365,41 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		return 1;
 	}
 
+	/*
+	 * Check for the BTRFS SB copies up until btrfs_device_size() and zero
+	 * it. So that kernel (or user for the manual recovery) don't have to
+	 * confuse with the stale SB copy during recovery.
+	 */
+	if (block_count != btrfs_device_size(fd, &st)) {
+		for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+			struct btrfs_super_block *disk_super;
+			char buf[BTRFS_SUPER_INFO_SIZE];
+			disk_super = (struct btrfs_super_block *)buf;
+
+			/* Already zeroed above */
+			if (btrfs_sb_offset(i) < block_count)
+				continue;
+
+			/* Beyond actual disk size */
+			if (btrfs_sb_offset(i) >= btrfs_device_size(fd, &st))
+				continue;
+
+			/* Does not contain any stale SB */
+			if (btrfs_read_dev_super(fd, disk_super,
+						 btrfs_sb_offset(i), 0))
+				continue;
+
+			ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
+						BTRFS_SUPER_INFO_SIZE,
+						btrfs_device_size(fd, &st));
+			if (ret < 0) {
+				error("failed to zero device '%s' bytenr %llu: %s",
+					file, btrfs_sb_offset(i), strerror(-ret));
+				return 1;
+			}
+		}
+	}
+
 	*block_count_ret = block_count;
 	return 0;
 }
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs-progs: wipe all copies of the stale superblock
  2018-03-21 10:19 [PATCH] btrfs-progs: wipe all copies of the stale superblock Anand Jain
@ 2018-03-21 12:53 ` Nikolay Borisov
  2018-03-22 12:14   ` Anand Jain
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2018-03-21 12:53 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 21.03.2018 12:19, Anand Jain wrote:
> Recovering from the other copies of the superblock is fundamental
> to BTRFS, which provides resilient against single LBA failure in
nit: s/resilient/resilience.
> the DUP group profile.

Furthermore, the number of superblock copies doesn't really depend on
the group profile of either the data or metadata so the second part of
that statement is incorrect. All in all I don't think this paragraph
really brings any value so I suggest removing it altogether.

> 
> Further, in the test case [1] it shows a good but stale superblock
> at copy#2. This will lead to confusion during auto/manual recovery.
> So strictly speaking if a device has three copies of the superblock
> and if we have permission to wipe it (-f option) then we have to wipe
> all the copies of the superblock.

The problem is not really described in human-readable format. The issue
is that if we re-create a smaller btrfs filesystem on a driver which
contained a previous filesystem then it's possible to have old fs and
new fs superblocks to co-exist. Also the title of the patch needs
rewording to put the changes in context i.e. :

"Wipe possible superblock stale copies on mkfs" or some such.

> 
> If there is any objection to writing beyond mkfs.btrfs -b <size>, then
> we could fail the mkfs/dev-add/dev-replace operation and ask the user
> the wipe using dd command manually, as anyway there is no use of keeping
> only the copy#2 when the new FS has been written.

This sentence also doesn't belong in the changelog, you are asking of
our opinion which is fine, but put it under the scissors line in that case.

> 
> Test case: Note that copy#2 fsid is different from primary and copy#1
> fsid.
> 
> mkfs.btrfs -qf /dev/mapper/vg-lv && \
> mkfs.btrfs -qf -b1G /dev/mapper/vg-lv && \
> btrfs in dump-super -a /dev/mapper/vg-lv | grep '.fsid|superblock:'
> 
> superblock: bytenr=65536, device=/dev/mapper/vg-lv
> dev_item.fsid		ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
> superblock: bytenr=67108864, device=/dev/mapper/vg-lv
> dev_item.fsid		ebc67d01-7fc5-43f0-90b4-d1925002551e [match]
> superblock: bytenr=274877906944, device=/dev/mapper/vg-lv
> dev_item.fsid		b97a9206-593b-4933-a424-c6a6ee23fe7c [match]
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  Hope with this we can patch the kernel to auto recover from the
>  failed primary SB. In the earlier discussion on that, I think we
>  are scrutinizing the wrong side (kernel) of the problem.
>  Also, we need to fail the mount if all the copies of the SB do
>  not have the same fsid.
> 
>  utils.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/utils.c b/utils.c
> index 00020e1d6bdf..9c027f77d9c1 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -365,6 +365,41 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
>  		return 1;
>  	}
>  
> +	/*
> +	 * Check for the BTRFS SB copies up until btrfs_device_size() and zero
> +	 * it. So that kernel (or user for the manual recovery) don't have to
> +	 * confuse with the stale SB copy during recovery.
> +	 */
> +	if (block_count != btrfs_device_size(fd, &st)) {
> +		for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
> +			struct btrfs_super_block *disk_super;
> +			char buf[BTRFS_SUPER_INFO_SIZE];
> +			disk_super = (struct btrfs_super_block *)buf;
> +
> +			/* Already zeroed above */
> +			if (btrfs_sb_offset(i) < block_count)
> +				continue;
> +
> +			/* Beyond actual disk size */
> +			if (btrfs_sb_offset(i) >= btrfs_device_size(fd, &st))
> +				continue;
> +
> +			/* Does not contain any stale SB */
> +			if (btrfs_read_dev_super(fd, disk_super,
> +						 btrfs_sb_offset(i), 0))
> +				continue;
> +
> +			ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
> +						BTRFS_SUPER_INFO_SIZE,
> +						btrfs_device_size(fd, &st));
> +			if (ret < 0) {
> +				error("failed to zero device '%s' bytenr %llu: %s",
> +					file, btrfs_sb_offset(i), strerror(-ret));
> +				return 1;
> +			}
> +		}
> +	}

A better place for this code is really inside btrfs_wipe_existing_sb.
Furthermore looking at libblkid it supports a way to wipe multiple
superblocks by way of: blkid_do_wipe. I.e
https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/v2.21/libblkid-docs/libblkid-Low-level-tags.html#blkid-do-wipe

Apparently this functionality has been in libblkid for quite some time
according to this blog post -
http://karelzak.blogspot.bg/2011/11/wipefs8-improvements.html (Karel Zak
is the person mainting various system libraries).

So can't we at the very least:

1) Remove some of our (open coded) implementation of
btrfs_wipe_existing_sb and use the generic blkid implementation.

2) Incorporate your code into btrfs_wipe_existing_sb OR extend the usage
of the generic blkid_do_wipe to wipe multiple copies of the sb ?

> +
>  	*block_count_ret = block_count;
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs-progs: wipe all copies of the stale superblock
  2018-03-21 12:53 ` Nikolay Borisov
@ 2018-03-22 12:14   ` Anand Jain
  2018-03-22 12:25     ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2018-03-22 12:14 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



  Ok. Will update the change log.

> A better place for this code is really inside btrfs_wipe_existing_sb.
> Furthermore looking at libblkid it supports a way to wipe multiple
> superblocks by way of: blkid_do_wipe. I.e
> https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/v2.21/libblkid-docs/libblkid-Low-level-tags.html#blkid-do-wipe
> 
> Apparently this functionality has been in libblkid for quite some time
> according to this blog post -
> http://karelzak.blogspot.bg/2011/11/wipefs8-improvements.html (Karel Zak
> is the person mainting various system libraries).
> 
> So can't we at the very least:
> 
> 1) Remove some of our (open coded) implementation of
> btrfs_wipe_existing_sb and use the generic blkid implementation.
> 
> 2) Incorporate your code into btrfs_wipe_existing_sb OR extend the usage
> of the generic blkid_do_wipe to wipe multiple copies of the sb ?

  In btrfs_prepare_device() we have already zeroed the SBs which were
  within the block count. The btrfs_wipe_existing_sb() is to handle the
  other type of FS SBs. Definitely, we can consolidate all this into a
  single function using libblkid. However, it can be done in a separate
  patch.

Thanks, Anand

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs-progs: wipe all copies of the stale superblock
  2018-03-22 12:14   ` Anand Jain
@ 2018-03-22 12:25     ` Nikolay Borisov
  2018-03-22 13:08       ` Anand Jain
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2018-03-22 12:25 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs



On 22.03.2018 14:14, Anand Jain wrote:
> 
> 
>  Ok. Will update the change log.
> 
>> A better place for this code is really inside btrfs_wipe_existing_sb.
>> Furthermore looking at libblkid it supports a way to wipe multiple
>> superblocks by way of: blkid_do_wipe. I.e
>> https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/v2.21/libblkid-docs/libblkid-Low-level-tags.html#blkid-do-wipe
>>
>>
>> Apparently this functionality has been in libblkid for quite some time
>> according to this blog post -
>> http://karelzak.blogspot.bg/2011/11/wipefs8-improvements.html (Karel Zak
>> is the person mainting various system libraries).
>>
>> So can't we at the very least:
>>
>> 1) Remove some of our (open coded) implementation of
>> btrfs_wipe_existing_sb and use the generic blkid implementation.
>>
>> 2) Incorporate your code into btrfs_wipe_existing_sb OR extend the usage
>> of the generic blkid_do_wipe to wipe multiple copies of the sb ?
> 
>  In btrfs_prepare_device() we have already zeroed the SBs which were
>  within the block count. The btrfs_wipe_existing_sb() is to handle the
>  other type of FS SBs. Definitely, we can consolidate all this into a
>  single function using libblkid. However, it can be done in a separate
>  patch.

So I misread the code in the beginning, so the actual btrfs SB are being
wiped out by the for loop which calls zer_dev_clamped. And in there all
superblocks between  0 - block_count are zeored. And block_count is
actual physical size of the device, right. And this value can further be
limited by the max_block_count being passed, which in mkfs case is
what's passed to -b.

So instead of adding more code, why not just call zero_dev_clamped with
the return value of btrfs_device_size rather than the possibly clamped
one block_count? That you always unconditionally will be sure to zero
out all the sb which are on the device, irrespective of the size of the
fs? Something like :

diff --git a/utils.c b/utils.c
index 715bab0ebfb5..17df72c3a1d5 100644
--- a/utils.c
+++ b/utils.c
@@ -316,7 +316,7 @@ static int btrfs_wipe_existing_sb(int fd)
 int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
                u64 max_block_count, unsigned opflags)
 {
-       u64 block_count;
+       u64 block_count, device_size;
        struct stat st;
        int i, ret;

@@ -326,7 +326,7 @@ int btrfs_prepare_device(int fd, const char *file,
u64 *block_count_ret,
                return 1;
        }

-       block_count = btrfs_device_size(fd, &st);
+       block_count = device_size = btrfs_device_size(fd, &st);
        if (block_count == 0) {
                error("unable to determine size of %s", file);
                return 1;
@@ -351,7 +351,7 @@ int btrfs_prepare_device(int fd, const char *file,
u64 *block_count_ret,
        ret = zero_dev_clamped(fd, 0, ZERO_DEV_BYTES, block_count);
        for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++)
                ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
-                                      BTRFS_SUPER_INFO_SIZE, block_count);
+                                      BTRFS_SUPER_INFO_SIZE, device_size);


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs-progs: wipe all copies of the stale superblock
  2018-03-22 12:25     ` Nikolay Borisov
@ 2018-03-22 13:08       ` Anand Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2018-03-22 13:08 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



> So I misread the code in the beginning, so the actual btrfs SB are being
> wiped out by the for loop which calls zer_dev_clamped. And in there all
> superblocks between  0 - block_count are zeored. And block_count is
> actual physical size of the device, right. And this value can further be
> limited by the max_block_count being passed, which in mkfs case is
> what's passed to -b.
> 
> So instead of adding more code, why not just call zero_dev_clamped with
> the return value of btrfs_device_size rather than the possibly clamped
> one block_count? That you always unconditionally will be sure to zero
> out all the sb which are on the device, irrespective of the size of the
> fs? Something like :

Hm. Actually I didn't want to zero unless I find a valid SB at the 
copy#2. I was kind of wanted suggestions if there is something that I am 
missing if I write beyond the -b <blockcount>. I don't find anything 
wrong though. I look at it as a kind of wipeall.

Thanks, Anand

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-22 13:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-21 10:19 [PATCH] btrfs-progs: wipe all copies of the stale superblock Anand Jain
2018-03-21 12:53 ` Nikolay Borisov
2018-03-22 12:14   ` Anand Jain
2018-03-22 12:25     ` Nikolay Borisov
2018-03-22 13:08       ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).