* [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write
@ 2019-03-27 9:46 Qu Wenruo
2019-03-27 9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-03-27 9:46 UTC (permalink / raw)
To: linux-btrfs
This urgent patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/flush_super
Which is based on v4.20.2.
Before this patch, btrfs-progs writes to the fs has no barrier at all.
All metadata and superblock are just buffered write, no barrier between
super blocks and metadata writes at all.
No wonder why even clear space cache can cause serious transid
corruption to the originally good fs.
Please merge this fix as soon as possible as I really don't want to see
btrfs-progs corrupting any fs any more.
Changelog:
v1.1:
- Use one line error report other than 2 seperate lines.
Qu Wenruo (2):
btrfs-progs: disk-io: Make super block write error easier to read
btrfs-progs: disk-io: Flush to ensure super block write is FUA
disk-io.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 45 insertions(+), 11 deletions(-)
--
2.21.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read 2019-03-27 9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo @ 2019-03-27 9:46 ` Qu Wenruo 2019-03-27 11:34 ` Nikolay Borisov 2019-03-27 9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski 2 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2019-03-27 9:46 UTC (permalink / raw) To: linux-btrfs When we failed to write super blocks, we just output something like: WARNING: failed to write sb: I/O error Or WARNING: failed to write all sb data There is no info about which device failed and there are two different error message for the same write error. This patch will change it to something more detailed: ERROR: failed to write super block for devid 1: write error: I/O error This provides the basis for later super block flush error handling. Signed-off-by: Qu Wenruo <wqu@suse.com> --- disk-io.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/disk-io.c b/disk-io.c index 797b9b79ea3c..f7fb7026cd94 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1599,8 +1599,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, ret = pwrite64(device->fd, fs_info->super_copy, BTRFS_SUPER_INFO_SIZE, fs_info->super_bytenr); - if (ret != BTRFS_SUPER_INFO_SIZE) - goto write_err; + if (ret != BTRFS_SUPER_INFO_SIZE) { + errno = EIO; + error( + "failed to write super block for devid %llu: write error: %m", + device->devid); + return -EIO; + } return 0; } @@ -1622,18 +1627,16 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, */ ret = pwrite64(device->fd, fs_info->super_copy, BTRFS_SUPER_INFO_SIZE, bytenr); - if (ret != BTRFS_SUPER_INFO_SIZE) - goto write_err; + if (ret != BTRFS_SUPER_INFO_SIZE) { + errno = EIO; + error( + "failed to write super block for devid %llu: write error: %m", + device->devid); + return -errno; + } } return 0; - -write_err: - if (ret > 0) - fprintf(stderr, "WARNING: failed to write all sb data\n"); - else - fprintf(stderr, "WARNING: failed to write sb: %m\n"); - return ret; } int write_all_supers(struct btrfs_fs_info *fs_info) -- 2.21.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read 2019-03-27 9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo @ 2019-03-27 11:34 ` Nikolay Borisov 0 siblings, 0 replies; 10+ messages in thread From: Nikolay Borisov @ 2019-03-27 11:34 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 27.03.19 г. 11:46 ч., Qu Wenruo wrote: > When we failed to write super blocks, we just output something like: > WARNING: failed to write sb: I/O error > Or > WARNING: failed to write all sb data > > There is no info about which device failed and there are two different > error message for the same write error. > > This patch will change it to something more detailed: > ERROR: failed to write super block for devid 1: write error: I/O error > > This provides the basis for later super block flush error handling. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > disk-io.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/disk-io.c b/disk-io.c > index 797b9b79ea3c..f7fb7026cd94 100644 > --- a/disk-io.c > +++ b/disk-io.c > @@ -1599,8 +1599,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, > ret = pwrite64(device->fd, fs_info->super_copy, > BTRFS_SUPER_INFO_SIZE, > fs_info->super_bytenr); > - if (ret != BTRFS_SUPER_INFO_SIZE) > - goto write_err; > + if (ret != BTRFS_SUPER_INFO_SIZE) { > + errno = EIO; > + error( > + "failed to write super block for devid %llu: write error: %m", > + device->devid); > + return -EIO; > + } > return 0; > } > > @@ -1622,18 +1627,16 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, > */ > ret = pwrite64(device->fd, fs_info->super_copy, > BTRFS_SUPER_INFO_SIZE, bytenr); > - if (ret != BTRFS_SUPER_INFO_SIZE) > - goto write_err; > + if (ret != BTRFS_SUPER_INFO_SIZE) { > + errno = EIO; > + error( > + "failed to write super block for devid %llu: write error: %m", > + device->devid); > + return -errno; > + } > } > > return 0; > - > -write_err: > - if (ret > 0) > - fprintf(stderr, "WARNING: failed to write all sb data\n"); > - else > - fprintf(stderr, "WARNING: failed to write sb: %m\n"); > - return ret; > } > > int write_all_supers(struct btrfs_fs_info *fs_info) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA 2019-03-27 9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo 2019-03-27 9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo @ 2019-03-27 9:46 ` Qu Wenruo 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski 2 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2019-03-27 9:46 UTC (permalink / raw) To: linux-btrfs; +Cc: Nikolay Borisov [BUG] There are tons of reports of btrfs-progs screwing up the fs, the most recent one is "btrfs check --clear-space-cache v1" triggered BUG_ON() and then leaving the fs with transid mismatch problem. [CAUSE] In kernel, we have block layer handing the flush work, even on devices without FUA support (like most SATA device using default libata settings), kernel handles FUA write by flushing the device, then normal write, and finish it with another flush. The pre-flush, write, post-flush works pretty well to implement FUA write. However in btrfs-progs we just use pwrite(), there is nothing keeping the write order. So even for basic v1 free space cache clearing, we have different vision on the write sequence from kernel bio layer (by dm-log-writes) and user space pwrite() calls. In btrfs-progs, with extra debug output in write_tree_block() and write_dev_supers(), we can see btrfs-progs follows the right write sequence: Opening filesystem to check... Checking filesystem on /dev/mapper/log UUID: 3feb3c8b-4eb3-42f3-8e9c-0af22dd58ecf write tree block start=1708130304 gen=39 write tree block start=1708146688 gen=39 write tree block start=1708163072 gen=39 write super devid=1 gen=39 write tree block start=1708179456 gen=40 write tree block start=1708195840 gen=40 write super devid=1 gen=40 write tree block start=1708130304 gen=41 write tree block start=1708146688 gen=41 write tree block start=1708228608 gen=41 write super devid=1 gen=41 write tree block start=1708163072 gen=42 write tree block start=1708179456 gen=42 write super devid=1 gen=42 write tree block start=1708130304 gen=43 write tree block start=1708146688 gen=43 write super devid=1 gen=43 Free space cache cleared But from dm-log-writes, the bio sequence is a different story: replaying 1742: sector 131072, size 4096, flags 0(NONE) replaying 1743: sector 128, size 4096, flags 0(NONE) <<< Only one sb write replaying 1744: sector 2828480, size 4096, flags 0(NONE) replaying 1745: sector 2828488, size 4096, flags 0(NONE) replaying 1746: sector 2828496, size 4096, flags 0(NONE) replaying 1787: sector 2304120, size 4096, flags 0(NONE) ...... replaying 1790: sector 2304144, size 4096, flags 0(NONE) replaying 1791: sector 2304152, size 4096, flags 0(NONE) replaying 1792: sector 0, size 0, flags 8(MARK) During the free space cache clearing, we committed 3 transaction but dm-log-write only caught one super block write. This means all the 3 writes were merged into the last super block write. And the super block write was the 2nd write, before all tree block writes, completely screwing up the metadata CoW protection. No wonder crashed btrfs-progs can make things worse. [FIX] Fix this super serious problem by implementing pre and post flush for the primary super block in btrfs-progs. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> --- disk-io.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/disk-io.c b/disk-io.c index f7fb7026cd94..fd56436d70ef 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1585,6 +1585,17 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, u32 crc; int i, ret; + /* + * We need to write super block after all metadata written. + * This is the equivalent of kernel pre-flush for FUA. + */ + ret = fsync(device->fd); + if (ret < 0) { + error( + "failed to write super block for devid %llu: flush error: %m", + device->devid); + return -errno; + } if (fs_info->super_bytenr != BTRFS_SUPER_INFO_OFFSET) { btrfs_set_super_bytenr(sb, fs_info->super_bytenr); crc = ~(u32)0; @@ -1606,6 +1617,13 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, device->devid); return -EIO; } + ret = fsync(device->fd); + if (ret < 0) { + error( + "failed to write super block for devid %llu: flush error: %m", + device->devid); + return -errno; + } return 0; } @@ -1634,6 +1652,19 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, device->devid); return -errno; } + /* + * Flush after the primary sb write, this is the equivalent of + * kernel post-flush for FUA write. + */ + if (i == 0) { + ret = fsync(device->fd); + if (ret < 0) { + error( + "failed to write super block for devid %llu: flush error: %m", + device->devid); + return -errno; + } + } } return 0; -- 2.21.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo 2019-03-27 9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo 2019-03-27 9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo @ 2019-03-27 14:07 ` Adam Borowski 2019-03-27 14:17 ` Hugo Mills ` (2 more replies) 2 siblings, 3 replies; 10+ messages in thread From: Adam Borowski @ 2019-03-27 14:07 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: > This urgent patchset can be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/flush_super > Which is based on v4.20.2. > > Before this patch, btrfs-progs writes to the fs has no barrier at all. > All metadata and superblock are just buffered write, no barrier between > super blocks and metadata writes at all. > > No wonder why even clear space cache can cause serious transid > corruption to the originally good fs. > > Please merge this fix as soon as possible as I really don't want to see > btrfs-progs corrupting any fs any more. How often does this happen in practice? I'm slightly incredulous about btrfs-progs crashing often. Especially that pwrite() is buffered on the kernel side, so we'd need a _kernel_ crash (usually a power loss) to break consistency. Obviously, a potential data loss bug is always something that needs fixing, I'm just wondering about severity. Or do I understand this wrong? Asking because Dimitri John Ledkov stepped down as Debian's maintainer of this package, and I'm taking up the mantle (with Nicholas D Steeves being around) -- modulo any updates other than important bug fixes being on hold because of Debian's freeze. Thus, I wonder if this is important enough to ask for a freeze exception. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8" ⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs? ⠈⠳⣄⠀⠀⠀⠀ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski @ 2019-03-27 14:17 ` Hugo Mills 2019-03-27 14:39 ` Qu Wenruo 2019-03-27 14:48 ` Qu Wenruo 2 siblings, 0 replies; 10+ messages in thread From: Hugo Mills @ 2019-03-27 14:17 UTC (permalink / raw) To: Adam Borowski; +Cc: Qu Wenruo, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2713 bytes --] On Wed, Mar 27, 2019 at 03:07:48PM +0100, Adam Borowski wrote: > On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: > > This urgent patchset can be fetched from github: > > https://github.com/adam900710/btrfs-progs/tree/flush_super > > Which is based on v4.20.2. > > > > Before this patch, btrfs-progs writes to the fs has no barrier at all. > > All metadata and superblock are just buffered write, no barrier between > > super blocks and metadata writes at all. > > > > No wonder why even clear space cache can cause serious transid > > corruption to the originally good fs. > > > > Please merge this fix as soon as possible as I really don't want to see > > btrfs-progs corrupting any fs any more. > > How often does this happen in practice? I'm slightly incredulous about > btrfs-progs crashing often. Especially that pwrite() is buffered on the > kernel side, so we'd need a _kernel_ crash (usually a power loss) to break > consistency. Obviously, a potential data loss bug is always something that > needs fixing, I'm just wondering about severity. It's a pretty regular event -- there's often a segfault or other uncontrolled exit when running btrfs check on a broken filesystem. It's usually hard to say whether that kind of thing (in --repair mode) is causing additional corruption, or whether it's not fixing anything, or whether it's fixing something and exposing the next error down. > Or do I understand this wrong? > > Asking because Dimitri John Ledkov stepped down as Debian's maintainer of > this package, and I'm taking up the mantle (with Nicholas D Steeves being > around) -- modulo any updates other than important bug fixes being on hold > because of Debian's freeze. Thus, I wonder if this is important enough to > ask for a freeze exception. My ha'penn'orth: it's probably not worth asking for a freeze exception -- I don't think it makes normal operation of the btrfs progs actively dangerous, but it's increasing risk somewhat on what are generally pretty rare operations in the lifetime of a filesystem. It's only the offline tools that are going to be affected here anyway -- most of the use-cases for btrfs-progs are in telling the kernel what to do, rather than modifying the FS directly. I'd say it's definitely worth fixing the issue upstream (which Qu is doing), and then (if possible) backporting it to your maintained packages after the Debian release. [Other opinions are also available from alternative vendors]. Hugo. -- Hugo Mills | Well, sir, the floor is yours. But remember, the hugo@... carfax.org.uk | roof is ours! http://carfax.org.uk/ | PGP: E2AB1DE4 | The Goons [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski 2019-03-27 14:17 ` Hugo Mills @ 2019-03-27 14:39 ` Qu Wenruo 2019-03-27 14:42 ` Qu Wenruo 2019-03-27 14:48 ` Qu Wenruo 2 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2019-03-27 14:39 UTC (permalink / raw) To: Adam Borowski, Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3154 bytes --] On 2019/3/27 下午10:07, Adam Borowski wrote: > On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: >> This urgent patchset can be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/flush_super >> Which is based on v4.20.2. >> >> Before this patch, btrfs-progs writes to the fs has no barrier at all. >> All metadata and superblock are just buffered write, no barrier between >> super blocks and metadata writes at all. >> >> No wonder why even clear space cache can cause serious transid >> corruption to the originally good fs. >> >> Please merge this fix as soon as possible as I really don't want to see >> btrfs-progs corrupting any fs any more. > > How often does this happen in practice? As long as some BUG_ON() triggers, it's highly possible some transid error will happen. > I'm slightly incredulous about > btrfs-progs crashing often. We're making progress enhancing btrfs-progs, but just check the recent mail list, there is a report of clear free space cache v1 causing transid error: https://lore.kernel.org/linux-btrfs/c59ce3ee-b0cd-f195-9dfa-11abd362d057@gmx.com/ And that's clear cache making the transid problem more serious. Adding to this, we still have a case where bad cacheing em could lead to BUG_ON (*), I think btrfs-progs currently is only safe for RO operation, not heavy write operations. *: The fix is already submitted: https://patchwork.kernel.org/patch/10840313/ > Especially that pwrite() is buffered on the > kernel side, so we'd need a _kernel_ crash (usually a power loss) to break > consistency. Obviously, a potential data loss bug is always something that > needs fixing, I'm just wondering about severity. Oh, I see the point. But there is some case still very concerning: - Trans 1 get committed, write the following ems: em at 16K (fs root, gen = 1) em at 32K em at 48K - trans 2 get committed em at 64K (fs root, gen = 2) em at 80K - trans 3 get half committed em at 16K (fs root, gen = 3) only trans 2 get its super block written to kernel, trans 3 get aborted before writing super block due to whatever the reason is. And you can see in that case, kernel will write: em at 16K (newer gen) em at 32K em at 48K em at 64K em at 80K sb at 4K (gen = 2) Then sb 2 will points to older fs root (gen = 1), but at that location, we have fs root with gen = 3. Causing the fs unable to be mounted. > > Or do I understand this wrong? > > Asking because Dimitri John Ledkov stepped down as Debian's maintainer of > this package, and I'm taking up the mantle (with Nicholas D Steeves being > around) -- modulo any updates other than important bug fixes being on hold > because of Debian's freeze. Thus, I wonder if this is important enough to > ask for a freeze exception. I can't help for packaging at all. As I'm an Arch user, just like a lot of reporters here. (And "I'm using Arch" here is not a meme). Personally I understand Debian has its policy, but really for btrfs-progs, we really like the upstream version. Thanks, Qu > > > Meow! > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 14:39 ` Qu Wenruo @ 2019-03-27 14:42 ` Qu Wenruo 0 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2019-03-27 14:42 UTC (permalink / raw) To: Qu Wenruo, Adam Borowski; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3498 bytes --] On 2019/3/27 下午10:39, Qu Wenruo wrote: > > > On 2019/3/27 下午10:07, Adam Borowski wrote: >> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: >>> This urgent patchset can be fetched from github: >>> https://github.com/adam900710/btrfs-progs/tree/flush_super >>> Which is based on v4.20.2. >>> >>> Before this patch, btrfs-progs writes to the fs has no barrier at all. >>> All metadata and superblock are just buffered write, no barrier between >>> super blocks and metadata writes at all. >>> >>> No wonder why even clear space cache can cause serious transid >>> corruption to the originally good fs. >>> >>> Please merge this fix as soon as possible as I really don't want to see >>> btrfs-progs corrupting any fs any more. >> >> How often does this happen in practice? > > As long as some BUG_ON() triggers, it's highly possible some transid > error will happen. > >> I'm slightly incredulous about >> btrfs-progs crashing often. > We're making progress enhancing btrfs-progs, but just check the recent > mail list, there is a report of clear free space cache v1 causing > transid error: > https://lore.kernel.org/linux-btrfs/c59ce3ee-b0cd-f195-9dfa-11abd362d057@gmx.com/ > > And that's clear cache making the transid problem more serious. > > Adding to this, we still have a case where bad cacheing em could lead to > BUG_ON (*), I think btrfs-progs currently is only safe for RO operation, > not heavy write operations. > > *: The fix is already submitted: > https://patchwork.kernel.org/patch/10840313/ > > >> Especially that pwrite() is buffered on the >> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break >> consistency. Obviously, a potential data loss bug is always something that >> needs fixing, I'm just wondering about severity. > > Oh, I see the point. > But there is some case still very concerning: > > - Trans 1 get committed, write the following ems: > em at 16K (fs root, gen = 1) > em at 32K > em at 48K > > - trans 2 get committed > em at 64K (fs root, gen = 2) Slightly wrong, in trans 2, fs root is not updated. So please discard this mail, I'll resend a better version. Thanks, Qu > em at 80K > > - trans 3 get half committed > em at 16K (fs root, gen = 3) > > only trans 2 get its super block written to kernel, trans 3 get aborted > before writing super block due to whatever the reason is. > > And you can see in that case, kernel will write: > em at 16K (newer gen) > em at 32K > em at 48K > em at 64K > em at 80K > sb at 4K (gen = 2) > > Then sb 2 will points to older fs root (gen = 1), but at that location, > we have fs root with gen = 3. > > Causing the fs unable to be mounted. > >> >> Or do I understand this wrong? >> >> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of >> this package, and I'm taking up the mantle (with Nicholas D Steeves being >> around) -- modulo any updates other than important bug fixes being on hold >> because of Debian's freeze. Thus, I wonder if this is important enough to >> ask for a freeze exception. > > I can't help for packaging at all. > As I'm an Arch user, just like a lot of reporters here. (And "I'm using > Arch" here is not a meme). > > Personally I understand Debian has its policy, but really for > btrfs-progs, we really like the upstream version. > > Thanks, > Qu > >> >> >> Meow! >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski 2019-03-27 14:17 ` Hugo Mills 2019-03-27 14:39 ` Qu Wenruo @ 2019-03-27 14:48 ` Qu Wenruo 2019-03-31 14:42 ` Qu Wenruo 2 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2019-03-27 14:48 UTC (permalink / raw) To: Adam Borowski; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2462 bytes --] On 2019/3/27 下午10:07, Adam Borowski wrote: > On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: >> This urgent patchset can be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/flush_super >> Which is based on v4.20.2. >> >> Before this patch, btrfs-progs writes to the fs has no barrier at all. >> All metadata and superblock are just buffered write, no barrier between >> super blocks and metadata writes at all. >> >> No wonder why even clear space cache can cause serious transid >> corruption to the originally good fs. >> >> Please merge this fix as soon as possible as I really don't want to see >> btrfs-progs corrupting any fs any more. > > How often does this happen in practice? I'm slightly incredulous about > btrfs-progs crashing often. Especially that pwrite() is buffered on the > kernel side, so we'd need a _kernel_ crash (usually a power loss) to break > consistency. Obviously, a potential data loss bug is always something that > needs fixing, I'm just wondering about severity. Here is a valid case where a crash could cause transid error: - transaction 1 new em at 16K (fs root, gen = 1) new em at 32K (extent root, gen = 1) new em at 48K (tree root, gen = 1) sb->fs root = gen 1 sb->extent root = gen 1 sb->tree root = gen 1 - transaction 2 new em at 64K (extent root, gen = 2) new em at 80K (tree root, gen = 2) sb->fs root = gen 1 at 16K sb->extent root = gen 2 sb->tree root = gen 2 - transaction 3, half backed due to error commit transaction new eb at 16K (tree root, gen = 3) submitted In above case, we will write the newest eb at 16K to disk, but with sb from transaction 2. Then sb expects to read out a tree with gen 1, but get a tree with gen 3. Further more, even we ignore the generation mismatch, the content of em 16K is completely wrong, super block of gen 2 expects fs root content from em at 16K, but its content is tree root. This should explain the severity much better. Thanks, Qu > > Or do I understand this wrong? > > Asking because Dimitri John Ledkov stepped down as Debian's maintainer of > this package, and I'm taking up the mantle (with Nicholas D Steeves being > around) -- modulo any updates other than important bug fixes being on hold > because of Debian's freeze. Thus, I wonder if this is important enough to > ask for a freeze exception. > > > Meow! > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write 2019-03-27 14:48 ` Qu Wenruo @ 2019-03-31 14:42 ` Qu Wenruo 0 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2019-03-31 14:42 UTC (permalink / raw) To: Qu Wenruo, Adam Borowski, David Sterba; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2706 bytes --] Not so gentle ping. IMHO this fix itself should be worthy a minor release. Thanks, Qu On 2019/3/27 下午10:48, Qu Wenruo wrote: > > > On 2019/3/27 下午10:07, Adam Borowski wrote: >> On Wed, Mar 27, 2019 at 05:46:50PM +0800, Qu Wenruo wrote: >>> This urgent patchset can be fetched from github: >>> https://github.com/adam900710/btrfs-progs/tree/flush_super >>> Which is based on v4.20.2. >>> >>> Before this patch, btrfs-progs writes to the fs has no barrier at all. >>> All metadata and superblock are just buffered write, no barrier between >>> super blocks and metadata writes at all. >>> >>> No wonder why even clear space cache can cause serious transid >>> corruption to the originally good fs. >>> >>> Please merge this fix as soon as possible as I really don't want to see >>> btrfs-progs corrupting any fs any more. >> >> How often does this happen in practice? I'm slightly incredulous about >> btrfs-progs crashing often. Especially that pwrite() is buffered on the >> kernel side, so we'd need a _kernel_ crash (usually a power loss) to break >> consistency. Obviously, a potential data loss bug is always something that >> needs fixing, I'm just wondering about severity. > > Here is a valid case where a crash could cause transid error: > > - transaction 1 > new em at 16K (fs root, gen = 1) > new em at 32K (extent root, gen = 1) > new em at 48K (tree root, gen = 1) > sb->fs root = gen 1 > sb->extent root = gen 1 > sb->tree root = gen 1 > > - transaction 2 > new em at 64K (extent root, gen = 2) > new em at 80K (tree root, gen = 2) > sb->fs root = gen 1 at 16K > sb->extent root = gen 2 > sb->tree root = gen 2 > > - transaction 3, half backed due to error commit transaction > new eb at 16K (tree root, gen = 3) submitted > > In above case, we will write the newest eb at 16K to disk, but with sb > from transaction 2. > > Then sb expects to read out a tree with gen 1, but get a tree with gen 3. > Further more, even we ignore the generation mismatch, the content of em > 16K is completely wrong, super block of gen 2 expects fs root content > from em at 16K, but its content is tree root. > > This should explain the severity much better. > > Thanks, > Qu > >> >> Or do I understand this wrong? >> >> Asking because Dimitri John Ledkov stepped down as Debian's maintainer of >> this package, and I'm taking up the mantle (with Nicholas D Steeves being >> around) -- modulo any updates other than important bug fixes being on hold >> because of Debian's freeze. Thus, I wonder if this is important enough to >> ask for a freeze exception. >> >> >> Meow! >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-03-31 14:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-27 9:46 [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Qu Wenruo 2019-03-27 9:46 ` [PATCH URGENT v1.1 1/2] btrfs-progs: disk-io: Make super block write error easier to read Qu Wenruo 2019-03-27 11:34 ` Nikolay Borisov 2019-03-27 9:46 ` [PATCH URGENT v1.1 2/2] btrfs-progs: disk-io: Flush to ensure super block write is FUA Qu Wenruo 2019-03-27 14:07 ` [PATCH URGENT v1.1 0/2] btrfs-progs: Fix the nobarrier behavior of write Adam Borowski 2019-03-27 14:17 ` Hugo Mills 2019-03-27 14:39 ` Qu Wenruo 2019-03-27 14:42 ` Qu Wenruo 2019-03-27 14:48 ` Qu Wenruo 2019-03-31 14:42 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox