* confusing behavior when supers mismatch
@ 2019-03-10 23:09 Chris Murphy
2019-03-10 23:18 ` Chris Murphy
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Chris Murphy @ 2019-03-10 23:09 UTC (permalink / raw)
To: Btrfs BTRFS
In the case where superblock 0 at 65536 is valid but stale (older than
the others):
1. btrfs check doesn't complain, the stale super is used for the check
2. when mounting, super 0 is used, no complaints at mount time, fairly
quickly the newer supers are overwritten
Is this expected? In particular, in lieu of `btrfs rescue super`
behavior which considers super 0 a bad super, and offers to fix it
from the newer ones, and when I answer y, it replaces super 0 with
newer information from the other supers.
I think the `btrfs rescue` behavior is correct. I would expect that
all the supers are read at mount time, and if there's discrepancy that
either there's code to suspiciously sanity check the latest roots in
the newest super, or it flat out fails to mount. Mounting based on
stale super data seems risky doesn't it?
--
Chris Murphy
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: confusing behavior when supers mismatch 2019-03-10 23:09 confusing behavior when supers mismatch Chris Murphy @ 2019-03-10 23:18 ` Chris Murphy 2019-03-11 1:17 ` Qu Wenruo 2019-03-11 14:38 ` Anand Jain 2 siblings, 0 replies; 11+ messages in thread From: Chris Murphy @ 2019-03-10 23:18 UTC (permalink / raw) To: Btrfs BTRFS Described behavior observed with: btrfs-progs 4.20.2 kernel 4.20.12 On Sun, Mar 10, 2019 at 5:09 PM Chris Murphy <lists@colorremedies.com> wrote: > > In the case where superblock 0 at 65536 is valid but stale (older than > the others): > > 1. btrfs check doesn't complain, the stale super is used for the check > 2. when mounting, super 0 is used, no complaints at mount time, fairly > quickly the newer supers are overwritten > > Is this expected? In particular, in lieu of `btrfs rescue super` > behavior which considers super 0 a bad super, and offers to fix it > from the newer ones, and when I answer y, it replaces super 0 with > newer information from the other supers. > > I think the `btrfs rescue` behavior is correct. I would expect that > all the supers are read at mount time, and if there's discrepancy that > either there's code to suspiciously sanity check the latest roots in > the newest super, or it flat out fails to mount. Mounting based on > stale super data seems risky doesn't it? > > -- > Chris Murphy -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-10 23:09 confusing behavior when supers mismatch Chris Murphy 2019-03-10 23:18 ` Chris Murphy @ 2019-03-11 1:17 ` Qu Wenruo 2019-03-11 3:20 ` Chris Murphy 2019-03-11 12:26 ` Nikolay Borisov 2019-03-11 14:38 ` Anand Jain 2 siblings, 2 replies; 11+ messages in thread From: Qu Wenruo @ 2019-03-11 1:17 UTC (permalink / raw) To: Chris Murphy, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 1652 bytes --] On 2019/3/11 上午7:09, Chris Murphy wrote: > In the case where superblock 0 at 65536 is valid but stale (older than > the others): Then this means either the fs is fuzzed, or the FUA implementation of the disk is completely screwed up. Btrfs kernel submit super blocks as the following sequence: 1) wait all metadata write 2) flush 3) FUA the primary superblock 4) write the backup superblocks If backup is newer than primary, then the FUA write doesn't reach disk before normal write. This means any fs could be corrupted on that disk, not only btrfs. > > 1. btrfs check doesn't complain, the stale super is used for the check > 2. when mounting, super 0 is used, no complaints at mount time, fairly > quickly the newer supers are overwritten The reason why kernel doesn't search backup roots is to avoid stale btrfs. For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as btrfs again, this would cause problems. So IMHO always use the primary superblock is the designed behavior. Thanks, Qu > > Is this expected? In particular, in lieu of `btrfs rescue super` > behavior which considers super 0 a bad super, and offers to fix it > from the newer ones, and when I answer y, it replaces super 0 with > newer information from the other supers. > > I think the `btrfs rescue` behavior is correct. I would expect that > all the supers are read at mount time, and if there's discrepancy that > either there's code to suspiciously sanity check the latest roots in > the newest super, or it flat out fails to mount. Mounting based on > stale super data seems risky doesn't it? > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 1:17 ` Qu Wenruo @ 2019-03-11 3:20 ` Chris Murphy 2019-03-11 4:58 ` Qu Wenruo 2019-03-11 12:26 ` Nikolay Borisov 1 sibling, 1 reply; 11+ messages in thread From: Chris Murphy @ 2019-03-11 3:20 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Mar 10, 2019 at 7:18 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/3/11 上午7:09, Chris Murphy wrote: > > In the case where superblock 0 at 65536 is valid but stale (older than > > the others): > > Then this means either the fs is fuzzed, or the FUA implementation of > the disk is completely screwed up. Fuzzed in this case by me. (Backstory: On linux-raid@ list, user accidentally zero'd first 1MiB of an mdadm array which contains Btrfs, but has a backup of this 1MiB. So I was testing in advance the behavior of restoring this 1MiB backup; but I'm guessing upon zero the working file system may have changed as it's not unmount, and in fact probably very soon after zeroing, wrote a good super replacement anyway. It seems the only missing thing we need is LVM metadata, maybe.) > So IMHO always use the primary superblock is the designed behavior. OK interesting. So in what case are the backup supers used? Only by `btrfs rescue super` or by explicit request, e.g. I notice even with an erased primary super signature, a `btrfs check -S1 --repair` does not cause the S0 super to be fixed up; and `btrfs rescue super` lacks an -S flag, so fixing accidentally wiped Btrfs super requires manual intervention. -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 3:20 ` Chris Murphy @ 2019-03-11 4:58 ` Qu Wenruo 2019-03-11 5:19 ` Chris Murphy 0 siblings, 1 reply; 11+ messages in thread From: Qu Wenruo @ 2019-03-11 4:58 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 1729 bytes --] On 2019/3/11 上午11:20, Chris Murphy wrote: > On Sun, Mar 10, 2019 at 7:18 PM Qu Wenruo <quwenruo.btrfs@gmx.com> > wrote: >> >> >> >> On 2019/3/11 上午7:09, Chris Murphy wrote: >>> In the case where superblock 0 at 65536 is valid but stale (older >>> than the others): >> >> Then this means either the fs is fuzzed, or the FUA implementation >> of the disk is completely screwed up. > > Fuzzed in this case by me. > > (Backstory: On linux-raid@ list, user accidentally zero'd first 1MiB > of an mdadm array which contains Btrfs, but has a backup of this > 1MiB. So I was testing in advance the behavior of restoring this > 1MiB backup; but I'm guessing upon zero the working file system may > have changed as it's not unmount, and in fact probably very soon > after zeroing, wrote a good super replacement anyway. It seems the > only missing thing we need is LVM metadata, maybe.) > > >> So IMHO always use the primary superblock is the designed >> behavior. > > OK interesting. So in what case are the backup supers used? Only by > `btrfs rescue super` or by explicit request, e.g. I notice even with > an erased primary super signature, a `btrfs check -S1 --repair` does > not cause the S0 super to be fixed up; This is because there is no thing to repair thus no need to commit transaction. If --repair modified anything, then it should fix all supers. But indeed, this behavior is a problem. > and `btrfs rescue super` lacks an -S flag, so fixing accidentally > wiped Btrfs super requires manual intervention. Normally 'btrfs rescue super' should be enough for accidentally wiped btrfs. If not, then we should fix it of course. Thanks, Qu > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 4:58 ` Qu Wenruo @ 2019-03-11 5:19 ` Chris Murphy 0 siblings, 0 replies; 11+ messages in thread From: Chris Murphy @ 2019-03-11 5:19 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Sun, Mar 10, 2019 at 10:58 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > Normally 'btrfs rescue super' should be enough for accidentally wiped btrfs. > > If not, then we should fix it of course. Looks like a bug then. $ sudo mkfs.btrfs /dev/mapper/vg-f30test btrfs-progs v4.20.2 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: e819e675-a697-45f8-8351-d109d5b95514 Node size: 16384 Sector size: 4096 Filesystem size: 100.00GiB Block group profiles: Data: single 8.00MiB Metadata: DUP 1.00GiB System: DUP 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 100.00GiB /dev/mapper/vg-f30test $ sudo mount /dev/mapper/vg-f30test /mnt/test $ sudo umount /mnt/test $ sudo wipefs -a /dev/mapper/vg-f30test /dev/mapper/vg-f30test: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d $ sudo btrfs rescue super -v /dev/mapper/vg-f30test No valid Btrfs found on /dev/mapper/vg-f30test Usage or syntax errors $ echo "_BHRfS_M" | sudo dd bs=1 count=8 of=/dev/mapper/vg-f30test seek=$((64*1024+64)) 8+0 records in 8+0 records out 8 bytes copied, 0.00523996 s, 1.5 kB/s $ sudo btrfs rescue super -v /dev/mapper/vg-f30test All Devices: Device: id = 1, name = /dev/mapper/vg-f30test Before Recovering: [All good supers]: device name = /dev/mapper/vg-f30test superblock bytenr = 65536 device name = /dev/mapper/vg-f30test superblock bytenr = 67108864 [All bad supers]: All supers are valid, no need to recover $ -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 1:17 ` Qu Wenruo 2019-03-11 3:20 ` Chris Murphy @ 2019-03-11 12:26 ` Nikolay Borisov 2019-03-11 12:35 ` Qu Wenruo 1 sibling, 1 reply; 11+ messages in thread From: Nikolay Borisov @ 2019-03-11 12:26 UTC (permalink / raw) To: Qu Wenruo, Chris Murphy, Btrfs BTRFS On 11.03.19 г. 3:17 ч., Qu Wenruo wrote: > > > On 2019/3/11 上午7:09, Chris Murphy wrote: >> In the case where superblock 0 at 65536 is valid but stale (older than >> the others): > > Then this means either the fs is fuzzed, or the FUA implementation of > the disk is completely screwed up. > > Btrfs kernel submit super blocks as the following sequence: > 1) wait all metadata write > 2) flush > 3) FUA the primary superblock SATA devices generally do not have FUA support. For example my evo 850 ssds do not support it nor does my evo 860 PRO. IMO not having functioning FUA seems to be the norm rather than an exception. > 4) write the backup superblocks > > If backup is newer than primary, then the FUA write doesn't reach disk > before normal write. > This means any fs could be corrupted on that disk, not only btrfs. > >> >> 1. btrfs check doesn't complain, the stale super is used for the check >> 2. when mounting, super 0 is used, no complaints at mount time, fairly >> quickly the newer supers are overwritten > > The reason why kernel doesn't search backup roots is to avoid stale btrfs. > For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as > btrfs again, this would cause problems. > > So IMHO always use the primary superblock is the designed behavior. > > Thanks, > Qu > >> >> Is this expected? In particular, in lieu of `btrfs rescue super` >> behavior which considers super 0 a bad super, and offers to fix it >> from the newer ones, and when I answer y, it replaces super 0 with >> newer information from the other supers. >> >> I think the `btrfs rescue` behavior is correct. I would expect that >> all the supers are read at mount time, and if there's discrepancy that >> either there's code to suspiciously sanity check the latest roots in >> the newest super, or it flat out fails to mount. Mounting based on >> stale super data seems risky doesn't it? >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 12:26 ` Nikolay Borisov @ 2019-03-11 12:35 ` Qu Wenruo 2019-03-11 12:37 ` Nikolay Borisov 0 siblings, 1 reply; 11+ messages in thread From: Qu Wenruo @ 2019-03-11 12:35 UTC (permalink / raw) To: Nikolay Borisov, Chris Murphy, Btrfs BTRFS On 2019/3/11 下午8:26, Nikolay Borisov wrote: > > > On 11.03.19 г. 3:17 ч., Qu Wenruo wrote: >> >> >> On 2019/3/11 上午7:09, Chris Murphy wrote: >>> In the case where superblock 0 at 65536 is valid but stale (older than >>> the others): >> >> Then this means either the fs is fuzzed, or the FUA implementation of >> the disk is completely screwed up. >> >> Btrfs kernel submit super blocks as the following sequence: >> 1) wait all metadata write >> 2) flush >> 3) FUA the primary superblock > > SATA devices generally do not have FUA support. For example my evo 850 > ssds do not support it nor does my evo 860 PRO. IMO not having > functioning FUA seems to be the norm rather than an exception. Kernel block layer will translate FUA to write + flush. So in that case we will do: 1) wait all metadata write 2) flush 3) write first sb, flush 4) write backup sb For FUA -> write + flush, it's less atomic than native FUA, but it should be good enough for pseudo-atomic. Thanks, Qu > > >> 4) write the backup superblocks >> >> If backup is newer than primary, then the FUA write doesn't reach disk >> before normal write. >> This means any fs could be corrupted on that disk, not only btrfs. >> >>> >>> 1. btrfs check doesn't complain, the stale super is used for the check >>> 2. when mounting, super 0 is used, no complaints at mount time, fairly >>> quickly the newer supers are overwritten >> >> The reason why kernel doesn't search backup roots is to avoid stale btrfs. >> For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as >> btrfs again, this would cause problems. >> >> So IMHO always use the primary superblock is the designed behavior. >> >> Thanks, >> Qu >> >>> >>> Is this expected? In particular, in lieu of `btrfs rescue super` >>> behavior which considers super 0 a bad super, and offers to fix it >>> from the newer ones, and when I answer y, it replaces super 0 with >>> newer information from the other supers. >>> >>> I think the `btrfs rescue` behavior is correct. I would expect that >>> all the supers are read at mount time, and if there's discrepancy that >>> either there's code to suspiciously sanity check the latest roots in >>> the newest super, or it flat out fails to mount. Mounting based on >>> stale super data seems risky doesn't it? >>> >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 12:35 ` Qu Wenruo @ 2019-03-11 12:37 ` Nikolay Borisov 2019-03-11 13:27 ` Qu Wenruo 0 siblings, 1 reply; 11+ messages in thread From: Nikolay Borisov @ 2019-03-11 12:37 UTC (permalink / raw) To: Qu Wenruo, Chris Murphy, Btrfs BTRFS On 11.03.19 г. 14:35 ч., Qu Wenruo wrote: > > > On 2019/3/11 下午8:26, Nikolay Borisov wrote: >> >> >> On 11.03.19 г. 3:17 ч., Qu Wenruo wrote: >>> >>> >>> On 2019/3/11 上午7:09, Chris Murphy wrote: >>>> In the case where superblock 0 at 65536 is valid but stale (older than >>>> the others): >>> >>> Then this means either the fs is fuzzed, or the FUA implementation of >>> the disk is completely screwed up. >>> >>> Btrfs kernel submit super blocks as the following sequence: >>> 1) wait all metadata write >>> 2) flush >>> 3) FUA the primary superblock >> >> SATA devices generally do not have FUA support. For example my evo 850 >> ssds do not support it nor does my evo 860 PRO. IMO not having >> functioning FUA seems to be the norm rather than an exception. > > Kernel block layer will translate FUA to write + flush. Where exactly does this happen? > So in that case we will do: > > 1) wait all metadata write > 2) flush > 3) write first sb, flush > 4) write backup sb > > For FUA -> write + flush, it's less atomic than native FUA, but it > should be good enough for pseudo-atomic. > > Thanks, > Qu > >> >> >>> 4) write the backup superblocks >>> >>> If backup is newer than primary, then the FUA write doesn't reach disk >>> before normal write. >>> This means any fs could be corrupted on that disk, not only btrfs. >>> >>>> >>>> 1. btrfs check doesn't complain, the stale super is used for the check >>>> 2. when mounting, super 0 is used, no complaints at mount time, fairly >>>> quickly the newer supers are overwritten >>> >>> The reason why kernel doesn't search backup roots is to avoid stale btrfs. >>> For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as >>> btrfs again, this would cause problems. >>> >>> So IMHO always use the primary superblock is the designed behavior. >>> >>> Thanks, >>> Qu >>> >>>> >>>> Is this expected? In particular, in lieu of `btrfs rescue super` >>>> behavior which considers super 0 a bad super, and offers to fix it >>>> from the newer ones, and when I answer y, it replaces super 0 with >>>> newer information from the other supers. >>>> >>>> I think the `btrfs rescue` behavior is correct. I would expect that >>>> all the supers are read at mount time, and if there's discrepancy that >>>> either there's code to suspiciously sanity check the latest roots in >>>> the newest super, or it flat out fails to mount. Mounting based on >>>> stale super data seems risky doesn't it? >>>> >>> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-11 12:37 ` Nikolay Borisov @ 2019-03-11 13:27 ` Qu Wenruo 0 siblings, 0 replies; 11+ messages in thread From: Qu Wenruo @ 2019-03-11 13:27 UTC (permalink / raw) To: Nikolay Borisov, Chris Murphy, Btrfs BTRFS On 2019/3/11 下午8:37, Nikolay Borisov wrote: > > > On 11.03.19 г. 14:35 ч., Qu Wenruo wrote: >> >> >> On 2019/3/11 下午8:26, Nikolay Borisov wrote: >>> >>> >>> On 11.03.19 г. 3:17 ч., Qu Wenruo wrote: >>>> >>>> >>>> On 2019/3/11 上午7:09, Chris Murphy wrote: >>>>> In the case where superblock 0 at 65536 is valid but stale (older than >>>>> the others): >>>> >>>> Then this means either the fs is fuzzed, or the FUA implementation of >>>> the disk is completely screwed up. >>>> >>>> Btrfs kernel submit super blocks as the following sequence: >>>> 1) wait all metadata write >>>> 2) flush >>>> 3) FUA the primary superblock >>> >>> SATA devices generally do not have FUA support. For example my evo 850 >>> ssds do not support it nor does my evo 860 PRO. IMO not having >>> functioning FUA seems to be the norm rather than an exception. >> >> Kernel block layer will translate FUA to write + flush. > > Where exactly does this happen? block/blk-flush.c The comment part at the beginning: * If the device has writeback cache and doesn't support FUA, REQ_PREFLUSH * is translated to PREFLUSH and REQ_FUA to POSTFLUSH. I need extra digging for exactly which line does this, but I think that should explain the workflow fine. Thanks, Qu > >> So in that case we will do: >> >> 1) wait all metadata write >> 2) flush >> 3) write first sb, flush >> 4) write backup sb >> >> For FUA -> write + flush, it's less atomic than native FUA, but it >> should be good enough for pseudo-atomic. >> >> Thanks, >> Qu >> >>> >>> >>>> 4) write the backup superblocks >>>> >>>> If backup is newer than primary, then the FUA write doesn't reach disk >>>> before normal write. >>>> This means any fs could be corrupted on that disk, not only btrfs. >>>> >>>>> >>>>> 1. btrfs check doesn't complain, the stale super is used for the check >>>>> 2. when mounting, super 0 is used, no complaints at mount time, fairly >>>>> quickly the newer supers are overwritten >>>> >>>> The reason why kernel doesn't search backup roots is to avoid stale btrfs. >>>> For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as >>>> btrfs again, this would cause problems. >>>> >>>> So IMHO always use the primary superblock is the designed behavior. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> Is this expected? In particular, in lieu of `btrfs rescue super` >>>>> behavior which considers super 0 a bad super, and offers to fix it >>>>> from the newer ones, and when I answer y, it replaces super 0 with >>>>> newer information from the other supers. >>>>> >>>>> I think the `btrfs rescue` behavior is correct. I would expect that >>>>> all the supers are read at mount time, and if there's discrepancy that >>>>> either there's code to suspiciously sanity check the latest roots in >>>>> the newest super, or it flat out fails to mount. Mounting based on >>>>> stale super data seems risky doesn't it? >>>>> >>>> >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: confusing behavior when supers mismatch 2019-03-10 23:09 confusing behavior when supers mismatch Chris Murphy 2019-03-10 23:18 ` Chris Murphy 2019-03-11 1:17 ` Qu Wenruo @ 2019-03-11 14:38 ` Anand Jain 2 siblings, 0 replies; 11+ messages in thread From: Anand Jain @ 2019-03-11 14:38 UTC (permalink / raw) To: Chris Murphy, Btrfs BTRFS On 3/11/19 7:09 AM, Chris Murphy wrote: > In the case where superblock 0 at 65536 is valid but stale (older than > the others): > > 1. btrfs check doesn't complain, the stale super is used for the check > 2. when mounting, super 0 is used, no complaints at mount time, fairly > quickly the newer supers are overwritten More or less all these were hardened in the patchset [1] which is in the mailing-list. [PATCH v4 0/7] Superblock read and verify cleanups Thanks, Anand > Is this expected? In particular, in lieu of `btrfs rescue super` > behavior which considers super 0 a bad super, and offers to fix it > from the newer ones, and when I answer y, it replaces super 0 with > newer information from the other supers. > > I think the `btrfs rescue` behavior is correct. I would expect that > all the supers are read at mount time, and if there's discrepancy that > either there's code to suspiciously sanity check the latest roots in > the newest super, or it flat out fails to mount. Mounting based on > stale super data seems risky doesn't it? > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-03-11 14:38 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-10 23:09 confusing behavior when supers mismatch Chris Murphy 2019-03-10 23:18 ` Chris Murphy 2019-03-11 1:17 ` Qu Wenruo 2019-03-11 3:20 ` Chris Murphy 2019-03-11 4:58 ` Qu Wenruo 2019-03-11 5:19 ` Chris Murphy 2019-03-11 12:26 ` Nikolay Borisov 2019-03-11 12:35 ` Qu Wenruo 2019-03-11 12:37 ` Nikolay Borisov 2019-03-11 13:27 ` Qu Wenruo 2019-03-11 14:38 ` Anand Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).