[PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
@ 2019-03-27  9:46 Qu Wenruo
  2019-03-27  9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27  9:46 UTC (permalink / raw)
  To: linux-btrfs

This urgent patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/flush_super
Which is based on v4.20.2.

Before this patch, btrfs-progs writes to the fs has no barrier at all.
All metadata and superblock are just buffered write, no barrier between
super blocks and metadata writes at all.

No wonder why even clear space cache can cause serious transid
corruption to the originally good fs.

Please merge this fix as soon as possible as I really don't want to see
btrfs-progs corrupting any fs any more.

Changelog:
v1.1:
- Use one line error report other than 2 seperate lines.

Qu Wenruo (2):
  btrfs-progs: disk-io: Make super block write error easier to read
  btrfs-progs: disk-io: Flush to ensure super block write is FUA

 disk-io.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 45 insertions(+), 11 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read
  2019-03-27  9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo
@ 2019-03-27  9:46 ` Qu Wenruo
  2019-03-27 11:34   ` Nikolay Borisov
  2019-03-27  9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo
  2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27  9:46 UTC (permalink / raw)
  To: linux-btrfs

When we failed to write super blocks, we just output something like:
  WARNING: failed to write sb: I/O error
Or
  WARNING: failed to write all sb data

There is no info about which device failed and there are two different
error message for the same write error.

This patch will change it to something more detailed:
ERROR: failed to write super block for devid 1: write error: I/O error

This provides the basis for later super block flush error handling.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 disk-io.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 797b9b79ea3c..f7fb7026cd94 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1599,8 +1599,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 		ret = pwrite64(device->fd, fs_info->super_copy,
 				BTRFS_SUPER_INFO_SIZE,
 				fs_info->super_bytenr);
-		if (ret != BTRFS_SUPER_INFO_SIZE)
-			goto write_err;
+		if (ret != BTRFS_SUPER_INFO_SIZE) {
+			errno = EIO;
+			error(
+		"failed to write super block for devid %llu: write error: %m",
+				device->devid);
+			return -EIO;
+		}
 		return 0;
 	}
 
@@ -1622,18 +1627,16 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 		 */
 		ret = pwrite64(device->fd, fs_info->super_copy,
 				BTRFS_SUPER_INFO_SIZE, bytenr);
-		if (ret != BTRFS_SUPER_INFO_SIZE)
-			goto write_err;
+		if (ret != BTRFS_SUPER_INFO_SIZE) {
+			errno = EIO;
+			error(
+		"failed to write super block for devid %llu: write error: %m",
+				device->devid);
+			return -errno;
+		}
 	}
 
 	return 0;
-
-write_err:
-	if (ret > 0)
-		fprintf(stderr, "WARNING: failed to write all sb data\n");
-	else
-		fprintf(stderr, "WARNING: failed to write sb: %m\n");
-	return ret;
 }
 
 int write_all_supers(struct btrfs_fs_info *fs_info)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read
  2019-03-27  9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
@ 2019-03-27 11:34   ` Nikolay Borisov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2019-03-27 11:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 27.03.19 г. 11:46 ч., Qu Wenruo wrote:
> When we failed to write super blocks, we just output something like:
>   WARNING: failed to write sb: I/O error
> Or
>   WARNING: failed to write all sb data
> 
> There is no info about which device failed and there are two different
> error message for the same write error.
> 
> This patch will change it to something more detailed:
> ERROR: failed to write super block for devid 1: write error: I/O error
> 
> This provides the basis for later super block flush error handling.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  disk-io.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/disk-io.c b/disk-io.c
> index 797b9b79ea3c..f7fb7026cd94 100644
> --- a/disk-io.c
> +++ b/disk-io.c
> @@ -1599,8 +1599,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
>  		ret = pwrite64(device->fd, fs_info->super_copy,
>  				BTRFS_SUPER_INFO_SIZE,
>  				fs_info->super_bytenr);
> -		if (ret != BTRFS_SUPER_INFO_SIZE)
> -			goto write_err;
> +		if (ret != BTRFS_SUPER_INFO_SIZE) {
> +			errno = EIO;
> +			error(
> +		"failed to write super block for devid %llu: write error: %m",
> +				device->devid);
> +			return -EIO;
> +		}
>  		return 0;
>  	}
>  
> @@ -1622,18 +1627,16 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
>  		 */
>  		ret = pwrite64(device->fd, fs_info->super_copy,
>  				BTRFS_SUPER_INFO_SIZE, bytenr);
> -		if (ret != BTRFS_SUPER_INFO_SIZE)
> -			goto write_err;
> +		if (ret != BTRFS_SUPER_INFO_SIZE) {
> +			errno = EIO;
> +			error(
> +		"failed to write super block for devid %llu: write error: %m",
> +				device->devid);
> +			return -errno;
> +		}
>  	}
>  
>  	return 0;
> -
> -write_err:
> -	if (ret > 0)
> -		fprintf(stderr, "WARNING: failed to write all sb data\n");
> -	else
> -		fprintf(stderr, "WARNING: failed to write sb: %m\n");
> -	return ret;
>  }
>  
>  int write_all_supers(struct btrfs_fs_info *fs_info)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA
  2019-03-27  9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo
  2019-03-27  9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
@ 2019-03-27  9:46 ` Qu Wenruo
  2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
  2 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27  9:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

[BUG]
There are tons of reports of btrfs-progs screwing up the fs, the most
recent one is "btrfs check --clear-space-cache v1" triggered BUG_ON()
and then leaving the fs with transid mismatch problem.

[CAUSE]
In kernel, we have block layer handing the flush work, even on devices
without FUA support (like most SATA device using default libata
settings), kernel handles FUA write by flushing the device, then normal
write, and finish it with another flush.

The pre-flush, write, post-flush works pretty well to implement FUA
write.

However in btrfs-progs we just use pwrite(), there is nothing keeping
the write order.

So even for basic v1 free space cache clearing, we have different vision
on the write sequence from kernel bio layer (by dm-log-writes) and user
space pwrite() calls.

In btrfs-progs, with extra debug output in write_tree_block() and
write_dev_supers(), we can see btrfs-progs follows the right write
sequence:

  Opening filesystem to check...
  Checking filesystem on /dev/mapper/log
  UUID: 3feb3c8b-4eb3-42f3-8e9c-0af22dd58ecf
  write tree block start=1708130304 gen=39
  write tree block start=1708146688 gen=39
  write tree block start=1708163072 gen=39
  write super devid=1 gen=39
  write tree block start=1708179456 gen=40
  write tree block start=1708195840 gen=40
  write super devid=1 gen=40
  write tree block start=1708130304 gen=41
  write tree block start=1708146688 gen=41
  write tree block start=1708228608 gen=41
  write super devid=1 gen=41
  write tree block start=1708163072 gen=42
  write tree block start=1708179456 gen=42
  write super devid=1 gen=42
  write tree block start=1708130304 gen=43
  write tree block start=1708146688 gen=43
  write super devid=1 gen=43
  Free space cache cleared

But from dm-log-writes, the bio sequence is a different story:

  replaying 1742: sector 131072, size 4096, flags 0(NONE)
  replaying 1743: sector 128, size 4096, flags 0(NONE) <<< Only one sb write
  replaying 1744: sector 2828480, size 4096, flags 0(NONE)
  replaying 1745: sector 2828488, size 4096, flags 0(NONE)
  replaying 1746: sector 2828496, size 4096, flags 0(NONE)
  replaying 1787: sector 2304120, size 4096, flags 0(NONE)
  ......
  replaying 1790: sector 2304144, size 4096, flags 0(NONE)
  replaying 1791: sector 2304152, size 4096, flags 0(NONE)
  replaying 1792: sector 0, size 0, flags 8(MARK)

During the free space cache clearing, we committed 3 transaction but
dm-log-write only caught one super block write.

This means all the 3 writes were merged into the last super block write.
And the super block write was the 2nd write, before all tree block
writes, completely screwing up the metadata CoW protection.

No wonder crashed btrfs-progs can make things worse.

[FIX]
Fix this super serious problem by implementing pre and post flush for
the primary super block in btrfs-progs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 disk-io.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/disk-io.c b/disk-io.c
index f7fb7026cd94..fd56436d70ef 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1585,6 +1585,17 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 	u32 crc;
 	int i, ret;
 
+	/*
+	 * We need to write super block after all metadata written.
+	 * This is the equivalent of kernel pre-flush for FUA.
+	 */
+	ret = fsync(device->fd);
+	if (ret < 0) {
+		error(
+		"failed to write super block for devid %llu: flush error: %m",
+			device->devid);
+		return -errno;
+	}
 	if (fs_info->super_bytenr != BTRFS_SUPER_INFO_OFFSET) {
 		btrfs_set_super_bytenr(sb, fs_info->super_bytenr);
 		crc = ~(u32)0;
@@ -1606,6 +1617,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 				device->devid);
 			return -EIO;
 		}
+		ret = fsync(device->fd);
+		if (ret < 0) {
+			error(
+		"failed to write super block for devid %llu: flush error: %m",
+				device->devid);
+			return -errno;
+		}
 		return 0;
 	}
 
@@ -1634,6 +1652,19 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 				device->devid);
 			return -errno;
 		}
+		/*
+		 * Flush after the primary sb write, this is the equivalent of
+		 * kernel post-flush for FUA write.
+		 */
+		if (i == 0) {
+			ret = fsync(device->fd);
+			if (ret < 0) {
+				error(
+		"failed to write super block for devid %llu: flush error: %m",
+					device->devid);
+				return -errno;
+			}
+		}
 	}
 
 	return 0;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27  9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo
  2019-03-27  9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
  2019-03-27  9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo
@ 2019-03-27 14:07 ` Adam Borowski
  2019-03-27 14:17   ` Hugo Mills
                     ` (2 more replies)
  2 siblings, 3 replies; 10+ messages in thread
From: Adam Borowski @ 2019-03-27 14:07 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
> This urgent patchset can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/flush_super
> Which is based on v4.20.2.
> 
> Before this patch, btrfs-progs writes to the fs has no barrier at all.
> All metadata and superblock are just buffered write, no barrier between
> super blocks and metadata writes at all.
> 
> No wonder why even clear space cache can cause serious transid
> corruption to the originally good fs.
> 
> Please merge this fix as soon as possible as I really don't want to see
> btrfs-progs corrupting any fs any more.

How often does this happen in practice?  I'm slightly incredulous about
btrfs-progs crashing often.   Especially that pwrite() is buffered on the
kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
consistency.  Obviously, a potential data loss bug is always something that
needs fixing, I'm just wondering about severity.

Or do I understand this wrong?

Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
this package, and I'm taking up the mantle (with Nicholas D Steeves being
around) -- modulo any updates other than important bug fixes being on hold
because of Debian's freeze.  Thus, I wonder if this is important enough to
ask for a freeze exception.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
@ 2019-03-27 14:17   ` Hugo Mills
  2019-03-27 14:39   ` Qu Wenruo
  2019-03-27 14:48   ` Qu Wenruo
  2 siblings, 0 replies; 10+ messages in thread
From: Hugo Mills @ 2019-03-27 14:17 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2713 bytes --]

On Wed, Mar 27, 2019 at 03:07:48PM +0100, Adam Borowski wrote:
> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
> > This urgent patchset can be fetched from github:
> > https://github.com/adam900710/btrfs-progs/tree/flush_super
> > Which is based on v4.20.2.
> > 
> > Before this patch, btrfs-progs writes to the fs has no barrier at all.
> > All metadata and superblock are just buffered write, no barrier between
> > super blocks and metadata writes at all.
> > 
> > No wonder why even clear space cache can cause serious transid
> > corruption to the originally good fs.
> > 
> > Please merge this fix as soon as possible as I really don't want to see
> > btrfs-progs corrupting any fs any more.
> 
> How often does this happen in practice?  I'm slightly incredulous about
> btrfs-progs crashing often.   Especially that pwrite() is buffered on the
> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
> consistency.  Obviously, a potential data loss bug is always something that
> needs fixing, I'm just wondering about severity.

   It's a pretty regular event -- there's often a segfault or other
uncontrolled exit when running btrfs check on a broken filesystem.
It's usually hard to say whether that kind of thing (in --repair mode)
is causing additional corruption, or whether it's not fixing anything,
or whether it's fixing something and exposing the next error down.

> Or do I understand this wrong?
> 
> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
> this package, and I'm taking up the mantle (with Nicholas D Steeves being
> around) -- modulo any updates other than important bug fixes being on hold
> because of Debian's freeze.  Thus, I wonder if this is important enough to
> ask for a freeze exception.

   My ha'penn'orth: it's probably not worth asking for a freeze
exception -- I don't think it makes normal operation of the btrfs
progs actively dangerous, but it's increasing risk somewhat on what
are generally pretty rare operations in the lifetime of a filesystem.
It's only the offline tools that are going to be affected here anyway
-- most of the use-cases for btrfs-progs are in telling the kernel
what to do, rather than modifying the FS directly.

   I'd say it's definitely worth fixing the issue upstream (which Qu
is doing), and then (if possible) backporting it to your maintained
packages after the Debian release.

[Other opinions are also available from alternative vendors].

   Hugo.

-- 
Hugo Mills             | Well, sir, the floor is yours. But remember, the
hugo@... carfax.org.uk | roof is ours!
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                             The Goons

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
  2019-03-27 14:17   ` Hugo Mills
@ 2019-03-27 14:39   ` Qu Wenruo
  2019-03-27 14:42     ` Qu Wenruo
  2019-03-27 14:48   ` Qu Wenruo
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27 14:39 UTC (permalink / raw)
  To: Adam Borowski, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3154 bytes --]



On 2019/3/27 下午10:07, Adam Borowski wrote:
> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
>> This urgent patchset can be fetched from github:
>> https://github.com/adam900710/btrfs-progs/tree/flush_super
>> Which is based on v4.20.2.
>>
>> Before this patch, btrfs-progs writes to the fs has no barrier at all.
>> All metadata and superblock are just buffered write, no barrier between
>> super blocks and metadata writes at all.
>>
>> No wonder why even clear space cache can cause serious transid
>> corruption to the originally good fs.
>>
>> Please merge this fix as soon as possible as I really don't want to see
>> btrfs-progs corrupting any fs any more.
> 
> How often does this happen in practice?

As long as some BUG_ON() triggers, it's highly possible some transid
error will happen.

>  I'm slightly incredulous about
> btrfs-progs crashing often.
We're making progress enhancing btrfs-progs, but just check the recent
mail list, there is a report of clear free space cache v1 causing
transid error:
https://lore.kernel.org/linux-btrfs/c59ce3ee-b0cd-f195-9dfa-11abd362d057@gmx.com/

And that's clear cache making the transid problem more serious.

Adding to this, we still have a case where bad cacheing em could lead to
BUG_ON (*), I think btrfs-progs currently is only safe for RO operation,
not heavy write operations.

*: The fix is already submitted:
https://patchwork.kernel.org/patch/10840313/


> Especially that pwrite() is buffered on the
> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
> consistency.  Obviously, a potential data loss bug is always something that
> needs fixing, I'm just wondering about severity.

Oh, I see the point.
But there is some case still very concerning:

- Trans 1 get committed, write the following ems:
  em at 16K (fs root, gen = 1)
  em at 32K
  em at 48K

- trans 2 get committed
  em at 64K (fs root, gen = 2)
  em at 80K

- trans 3 get half committed
  em at 16K (fs root, gen = 3)

only trans 2 get its super block written to kernel, trans 3 get aborted
before writing super block due to whatever the reason is.

And you can see in that case, kernel will write:
 em at 16K (newer gen)
 em at 32K
 em at 48K
 em at 64K
 em at 80K
 sb at 4K (gen = 2)

Then sb 2 will points to older fs root (gen = 1), but at that location,
we have fs root with gen = 3.

Causing the fs unable to be mounted.

> 
> Or do I understand this wrong?
> 
> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
> this package, and I'm taking up the mantle (with Nicholas D Steeves being
> around) -- modulo any updates other than important bug fixes being on hold
> because of Debian's freeze.  Thus, I wonder if this is important enough to
> ask for a freeze exception.

I can't help for packaging at all.
As I'm an Arch user, just like a lot of reporters here. (And "I'm using
Arch" here is not a meme).

Personally I understand Debian has its policy, but really for
btrfs-progs, we really like the upstream version.

Thanks,
Qu

> 
> 
> Meow!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27 14:39   ` Qu Wenruo
@ 2019-03-27 14:42     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27 14:42 UTC (permalink / raw)
  To: Qu Wenruo, Adam Borowski; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3498 bytes --]



On 2019/3/27 下午10:39, Qu Wenruo wrote:
> 
> 
> On 2019/3/27 下午10:07, Adam Borowski wrote:
>> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
>>> This urgent patchset can be fetched from github:
>>> https://github.com/adam900710/btrfs-progs/tree/flush_super
>>> Which is based on v4.20.2.
>>>
>>> Before this patch, btrfs-progs writes to the fs has no barrier at all.
>>> All metadata and superblock are just buffered write, no barrier between
>>> super blocks and metadata writes at all.
>>>
>>> No wonder why even clear space cache can cause serious transid
>>> corruption to the originally good fs.
>>>
>>> Please merge this fix as soon as possible as I really don't want to see
>>> btrfs-progs corrupting any fs any more.
>>
>> How often does this happen in practice?
> 
> As long as some BUG_ON() triggers, it's highly possible some transid
> error will happen.
> 
>>  I'm slightly incredulous about
>> btrfs-progs crashing often.
> We're making progress enhancing btrfs-progs, but just check the recent
> mail list, there is a report of clear free space cache v1 causing
> transid error:
> https://lore.kernel.org/linux-btrfs/c59ce3ee-b0cd-f195-9dfa-11abd362d057@gmx.com/
> 
> And that's clear cache making the transid problem more serious.
> 
> Adding to this, we still have a case where bad cacheing em could lead to
> BUG_ON (*), I think btrfs-progs currently is only safe for RO operation,
> not heavy write operations.
> 
> *: The fix is already submitted:
> https://patchwork.kernel.org/patch/10840313/
> 
> 
>> Especially that pwrite() is buffered on the
>> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
>> consistency.  Obviously, a potential data loss bug is always something that
>> needs fixing, I'm just wondering about severity.
> 
> Oh, I see the point.
> But there is some case still very concerning:
> 
> - Trans 1 get committed, write the following ems:
>   em at 16K (fs root, gen = 1)
>   em at 32K
>   em at 48K
> 
> - trans 2 get committed
>   em at 64K (fs root, gen = 2)

Slightly wrong, in trans 2, fs root is not updated.

So please discard this mail, I'll resend a better version.

Thanks,
Qu

>   em at 80K
> 
> - trans 3 get half committed
>   em at 16K (fs root, gen = 3)
> 
> only trans 2 get its super block written to kernel, trans 3 get aborted
> before writing super block due to whatever the reason is.
> 
> And you can see in that case, kernel will write:
>  em at 16K (newer gen)
>  em at 32K
>  em at 48K
>  em at 64K
>  em at 80K
>  sb at 4K (gen = 2)
> 
> Then sb 2 will points to older fs root (gen = 1), but at that location,
> we have fs root with gen = 3.
> 
> Causing the fs unable to be mounted.
> 
>>
>> Or do I understand this wrong?
>>
>> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
>> this package, and I'm taking up the mantle (with Nicholas D Steeves being
>> around) -- modulo any updates other than important bug fixes being on hold
>> because of Debian's freeze.  Thus, I wonder if this is important enough to
>> ask for a freeze exception.
> 
> I can't help for packaging at all.
> As I'm an Arch user, just like a lot of reporters here. (And "I'm using
> Arch" here is not a meme).
> 
> Personally I understand Debian has its policy, but really for
> btrfs-progs, we really like the upstream version.
> 
> Thanks,
> Qu
> 
>>
>>
>> Meow!
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
  2019-03-27 14:17   ` Hugo Mills
  2019-03-27 14:39   ` Qu Wenruo
@ 2019-03-27 14:48   ` Qu Wenruo
  2019-03-31 14:42     ` Qu Wenruo
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27 14:48 UTC (permalink / raw)
  To: Adam Borowski; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2462 bytes --]



On 2019/3/27 下午10:07, Adam Borowski wrote:
> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
>> This urgent patchset can be fetched from github:
>> https://github.com/adam900710/btrfs-progs/tree/flush_super
>> Which is based on v4.20.2.
>>
>> Before this patch, btrfs-progs writes to the fs has no barrier at all.
>> All metadata and superblock are just buffered write, no barrier between
>> super blocks and metadata writes at all.
>>
>> No wonder why even clear space cache can cause serious transid
>> corruption to the originally good fs.
>>
>> Please merge this fix as soon as possible as I really don't want to see
>> btrfs-progs corrupting any fs any more.
> 
> How often does this happen in practice?  I'm slightly incredulous about
> btrfs-progs crashing often.   Especially that pwrite() is buffered on the
> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
> consistency.  Obviously, a potential data loss bug is always something that
> needs fixing, I'm just wondering about severity.

Here is a valid case where a crash could cause transid error:

- transaction 1
  new em at 16K (fs root, gen = 1)
  new em at 32K (extent root, gen = 1)
  new em at 48K (tree root, gen = 1)
  sb->fs root = gen 1
  sb->extent root = gen 1
  sb->tree root = gen 1

- transaction 2
  new em at 64K (extent root, gen = 2)
  new em at 80K (tree root, gen = 2)
  sb->fs root = gen 1 at 16K
  sb->extent root = gen 2
  sb->tree root = gen 2

- transaction 3, half backed due to error commit transaction
  new eb at 16K (tree root, gen = 3) submitted

In above case, we will write the newest eb at 16K to disk, but with sb
from transaction 2.

Then sb expects to read out a tree with gen 1, but get a tree with gen 3.
Further more, even we ignore the generation mismatch, the content of em
16K is completely wrong, super block of gen 2 expects fs root content
from em at 16K, but its content is tree root.

This should explain the severity much better.

Thanks,
Qu

> 
> Or do I understand this wrong?
> 
> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
> this package, and I'm taking up the mantle (with Nicholas D Steeves being
> around) -- modulo any updates other than important bug fixes being on hold
> because of Debian's freeze.  Thus, I wonder if this is important enough to
> ask for a freeze exception.
> 
> 
> Meow!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
  2019-03-27 14:48   ` Qu Wenruo
@ 2019-03-31 14:42     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-03-31 14:42 UTC (permalink / raw)
  To: Qu Wenruo, Adam Borowski, David Sterba; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2706 bytes --]

Not so gentle ping.

IMHO this fix itself should be worthy a minor release.

Thanks,
Qu

On 2019/3/27 下午10:48, Qu Wenruo wrote:
> 
> 
> On 2019/3/27 下午10:07, Adam Borowski wrote:
>> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote:
>>> This urgent patchset can be fetched from github:
>>> https://github.com/adam900710/btrfs-progs/tree/flush_super
>>> Which is based on v4.20.2.
>>>
>>> Before this patch, btrfs-progs writes to the fs has no barrier at all.
>>> All metadata and superblock are just buffered write, no barrier between
>>> super blocks and metadata writes at all.
>>>
>>> No wonder why even clear space cache can cause serious transid
>>> corruption to the originally good fs.
>>>
>>> Please merge this fix as soon as possible as I really don't want to see
>>> btrfs-progs corrupting any fs any more.
>>
>> How often does this happen in practice?  I'm slightly incredulous about
>> btrfs-progs crashing often.   Especially that pwrite() is buffered on the
>> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break
>> consistency.  Obviously, a potential data loss bug is always something that
>> needs fixing, I'm just wondering about severity.
> 
> Here is a valid case where a crash could cause transid error:
> 
> - transaction 1
>   new em at 16K (fs root, gen = 1)
>   new em at 32K (extent root, gen = 1)
>   new em at 48K (tree root, gen = 1)
>   sb->fs root = gen 1
>   sb->extent root = gen 1
>   sb->tree root = gen 1
> 
> - transaction 2
>   new em at 64K (extent root, gen = 2)
>   new em at 80K (tree root, gen = 2)
>   sb->fs root = gen 1 at 16K
>   sb->extent root = gen 2
>   sb->tree root = gen 2
> 
> - transaction 3, half backed due to error commit transaction
>   new eb at 16K (tree root, gen = 3) submitted
> 
> In above case, we will write the newest eb at 16K to disk, but with sb
> from transaction 2.
> 
> Then sb expects to read out a tree with gen 1, but get a tree with gen 3.
> Further more, even we ignore the generation mismatch, the content of em
> 16K is completely wrong, super block of gen 2 expects fs root content
> from em at 16K, but its content is tree root.
> 
> This should explain the severity much better.
> 
> Thanks,
> Qu
> 
>>
>> Or do I understand this wrong?
>>
>> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of
>> this package, and I'm taking up the mantle (with Nicholas D Steeves being
>> around) -- modulo any updates other than important bug fixes being on hold
>> because of Debian's freeze.  Thus, I wonder if this is important enough to
>> ask for a freeze exception.
>>
>>
>> Meow!
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-31 14:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-27  9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo
2019-03-27  9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
2019-03-27 11:34   ` Nikolay Borisov
2019-03-27  9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo
2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski
2019-03-27 14:17   ` Hugo Mills
2019-03-27 14:39   ` Qu Wenruo
2019-03-27 14:42     ` Qu Wenruo
2019-03-27 14:48   ` Qu Wenruo
2019-03-31 14:42     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox