* evidence of persistent state, despite device disconnects @ 2016-01-02 19:22 Chris Murphy 2016-01-03 13:48 ` Duncan 0 siblings, 1 reply; 9+ messages in thread From: Chris Murphy @ 2016-01-02 19:22 UTC (permalink / raw) To: Btrfs BTRFS OK, I basically do not trust the f'n kernel anymore. I'm having to reboot in order to get to a (reasonably) deterministic state. Merely disconnecting devices doesn't make all aspects of that device and its filesystem, vanish. I think this persistence might be causing some Btrfs corruptions that don't seem to make any sense. Here is one example that I've kept track of every step of the way: I have a Btrfs raid1 that fails to mount rw,degraded: [ 174.520303] BTRFS info (device sdc): allowing degraded mounts [ 174.520421] BTRFS info (device sdc): disk space caching is enabled [ 174.520527] BTRFS: has skinny extents [ 174.528060] BTRFS warning (device sdc): devid 1 uuid 94c62352-2568-4abe-8a58-828d1766719c is missing [ 177.924127] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 177.950761] BTRFS: open_ctree failed When mounted -o ro,degraded [root@f23s ~]# btrfs fi df /mnt/brick2 Data, RAID1: total=502.00GiB, used=499.69GiB Data, single: total=1.00GiB, used=2.00MiB System, RAID1: total=32.00MiB, used=80.00KiB System, single: total=32.00MiB, used=32.00KiB Metadata, RAID1: total=2.00GiB, used=1008.22MiB Metadata, single: total=1.00GiB, used=0.00B GlobalReserve, single: total=352.00MiB, used=0.00B What the F? Because the last time it was normal/non-degraded and mounted, the only chunks were raid1 chunks. Somehow, single chunks have been added and used without any kernel messages to warn the user they no longer have a raid1, in effect. What *exactly* happened since this was an intact raid1 only, 2 device volume? 1. umount /mnt/brick ##cleanly umounted 2. ## USB cables from the drives disconnected 3. lsblk and blkid see neither of them 4. devid1 is reconnected 5. devid1 is issued ATA security-erase-enhanced command via hdparm 6. devid1 is physically disconnected 7. oldidevid1 is luksformatted and opened 8. devid2 is connected 9. [root@f23s ~]# lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT sdb crypto_LUKS 493a7656-8fe6-46e9-88af-a0ffe83ced7e └─sdb sdc btrfs second 197606b2-9f4a-4742-8824-7fc93285c29c /mnt/brick2 [root@f23s ~]# btrfs fi show /mnt/brick2 Label: 'second' uuid: 197606b2-9f4a-4742-8824-7fc93285c29c Total devices 2 FS bytes used 500.68GiB devid 1 size 697.64GiB used 504.03GiB path /dev/sdb devid 2 size 697.64GiB used 504.03GiB path /dev/sdc WTF?! This shouldn't be possible. devid1 is *completely* obliterated. It was securely erased. It has been luks formatted. It has been disconnected multiple times (as has devid2). And yet Btrfs sees this as an intact pair? That's just complete crap. *AND* It let's me mount it! Not degraded! No error messages! 11. umount /mnt/brick2 12. Reboot 13. btrfs fi show warning, device 1 is missing warning devid 1 not found already Label: 'second' uuid: 197606b2-9f4a-4742-8824-7fc93285c29c Total devices 2 FS bytes used 500.68GiB devid 2 size 697.64GiB used 506.06GiB path /dev/sdc *** Some devices missing 14. # mount -o degraded, /dev/sdc /mnt/brick2 mount: wrong fs type, bad option, bad superblock on /dev/sdc and the trace at the very top with bogus missing devices(1) exceeds the limit(0), writeable mount is not allowed. So during that not degraded mount of the file system where it saw a ghost of devid1, it wrote single chunks to devid2. And now devid2 can only ever be mounted read only. It's impossible to fix it, because I can't add devices when ro mounted. Does anyone have any idea what tool to use to explain how the devid1 /dev/sdb, which has been securely erased, luks formatted, disconnected, reconnected, *STILL* results in Btrfs thinking it's a valid drive and allowing a non-degraded mount until there's a reboot? That's really scary. It's like the btrfs kernel code isn't refreshing its own fs or dev states when other parts of the kernel know it's gone. Maybe a 'btrfs dev scan' would have cleared this up, but I shouldn't have to do that to refresh Btrfs's state anytime I disconnect and connect devices just to make sure it doesn't sabotage the devices by surreptitiously adding single chunks to one of the drives! -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-02 19:22 evidence of persistent state, despite device disconnects Chris Murphy @ 2016-01-03 13:48 ` Duncan 2016-01-03 21:33 ` Chris Murphy 0 siblings, 1 reply; 9+ messages in thread From: Duncan @ 2016-01-03 13:48 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Sat, 02 Jan 2016 12:22:07 -0700 as excerpted: > OK, I basically do not trust the f'n kernel anymore. I'm having to > reboot in order to get to a (reasonably) deterministic state. Merely > disconnecting devices doesn't make all aspects of that device and its > filesystem, vanish. We already knew that btrfs itself doesn't track device state very well, and that a reboot or for those with btrfs as a module, module unload/ reload, was needed to fully clear state. Are you suggesting it's more than that? > I think this persistence might be causing some Btrfs corruptions that > don't seem to make any sense. Here is one example that I've kept track > of every step of the way: > > I have a Btrfs raid1 that fails to mount rw,degraded: [Shortening the UUIDs for easier 80-column posting. I deleted them in the first attempt, but decided they were useful here, as UUIDs are about the only way to track what's what as you will see, in the absence of btrfs fi show, with mountpoints jumping between brick and brick1, with references to devids that we don't know anything about due to that lack of fi show output, etc.] > [ 174.520303] BTRFS info (device sdc): allowing degraded mounts > [ 174.520421] BTRFS info (device sdc): disk space caching is enabled > [ 174.520527] BTRFS: has skinny extents > [ 174.528060] BTRFS warning (device sdc): > devid 1 uuid [...]-828d1766719c is missing > [ 177.924127] BTRFS: missing devices(1) exceeds the limit(0), > writeable mount is not allowed > [ 177.950761] BTRFS: open_ctree failed That's the -828 UUID... OK, looks like your "raid1" must have some single or raid0 chunks, which have a missing device limit of 0. BTW, what kernel? You don't say. Meanwhile, I lost track of whether the patch set to do per-chunk evaluation of whether it's all there, thereby allowing degraded,rw mounting of multi-device filesystems with single chunks only on available devices, ever made it in, and if so, in which kernel. I /think/ they were too late to make it into 4.3, but should have made it into 4.4. But unfortunately, neither the 4.3 or 4.4 kernel btrfs changes are up on the wiki yet, and to confirm it in git I'd have to go back and figure out what those patches were named, which I'm too lazy to do ATM. But of course without a reported kernel here, knowing whether they made it in and for what kernel wouldn't help, despite that information apparently being apropos to the situation. > When mounted -o ro,degraded > > [root@f23s ~]# btrfs fi df /mnt/brick2 > Data, RAID1: total=502.00GiB, used=499.69GiB > Data, single: total=1.00GiB, used=2.00MiB > System, RAID1: total=32.00MiB, used=80.00KiB > System, single: total=32.00MiB, used=32.00KiB > Metadata, RAID1: total=2.00GiB, used=1008.22MiB > Metadata, single: total=1.00GiB, used=0.00B > GlobalReserve, single: total=352.00MiB, used=0.00B > > What the F? OK, there we have the btrfs fi df. But there's no btrfs fi show. And you posted the dmesg from the mount, but didn't give the commandline, so we have nothing connecting the btrfs fi df /mnt/brick2 (note the brick2), to the above dmesg output. No mount commandline, no btrfs fi show, nothing else, at this point. > Because the last time it was normal/non-degraded and mounted, the only > chunks were raid1 chunks. Somehow, single chunks have been added and > used without any kernel messages to warn the user they no longer have a > raid1, in effect. > > What *exactly* happened since this was an intact raid1 only, 2 device > volume? > > 1. umount /mnt/brick ##cleanly umounted OK, the above fi df was for /mnt/brick2. Here you're umounting /mnt/brick. **NOT** the same mountpoint. So **NOT** cleanly umounted, as that's an entirely different filesystem. Unless you did a copy/pasto and you actually umounted brick2. But that's not what it says... > 2. ## USB cables from the drives disconnected > 3. lsblk and blkid see neither of them > 4. devid1 is reconnected Wait... devid1? For brick or brick2? Either way, we have no idea what devid1 is, because we don't have a btrfs fi show. Honestly, CMurphy, your posts are /normally/ much more coherent than this. Joking, but serious, are you still recovering from your new year's partying? There's too many missing pieces and inconsistencies here. It's not like your normal posts. > 5. devid1 is issued ATA security-erase-enhanced command via hdparm > 6. devid1 is physically disconnected > 7. oldidevid1 is luksformatted and opened Oldidevid1? Is that old devid1? You said it was physically disconnected. Nothing about reconnection. So was it reconnected and lukesformated, or is this a different device, presumably from some much older btrfs devid1? > 8. devid2 is connected > 9. [root@f23s ~]# lsblk -f > NAME FSTYPE LABEL UUID MOUNTPOINT > sdb crypto_LUKS [...]-a0ffe83ced7e > └─sdb > sdc btrfs second [...]-7fc93285c29c /mnt/brick2 > > [root@f23s ~]# btrfs fi show /mnt/brick2 > Label: 'second' uuid: [...]-7fc93285c29c > Total devices 2 FS bytes used 500.68GiB > devid 1 size 697.64GiB used 504.03GiB path /dev/sdb > devid 2 size 697.64GiB used 504.03GiB path /dev/sdc UUIDs: No -828 UUID to match the dmesg output above. The -a0ff UUID is new, apparently from the luksformatting in #7, and the -7fc UUID matches between the lsblk and (NOW we get it!!) btrfs fi show, but isn't the -828 UUID in the dmesg above, so that dmesg segment is presumably for some other btrfs. Note that with all the device disconnection and reconnection going on, the /dev/sdc here wouldn't be expected to be the same device as the /dev/sdc in the dmesg above, so mismatching UUIDs despite matching /dev/sdc device-paths isn't at all unexpected. Which would seem to imply that while we have a btrfs fi show now, it's not the btrfs in the dmesg above, because the UUIDs don't match. Either that or the UUID in the dmesg isn't the filesystem UUID but rather the device UUID. But I can't verify that right now as the dmesg output for a whole device doesn't list UUIDs, only the nominal device node (nominal being the one used to mount, on multi-device btrfs). Either way, the UUID in the dmesg from the btrfs mount error doesn't match any other UUID we've seen, yet. Meanwhile, both these show a mounted btrfs on /mnt/brick2, but there's no mount in the sequence above. Based on the sequence above, nothing should be mounted at /mnt/brick2. But at this point there's enough odd and nonsensical about what we know and don't know from the post so far that this really isn't surprising... > WTF?! This shouldn't be possible. devid1 is *completely* obliterated. > It was securely erased. It has been luks formatted. It has been > disconnected multiple times (as has devid2). And yet Btrfs sees this as > an intact pair? That's just complete crap. *AND* Why would you expect it to make any sense? The rest of the post doesn't. > It let's me mount it! Not degraded! No error messages! Oh, here we're talking about a mount! But as I said, no mount in the sequence! At this point it's just entertainment. I'm not even trying to make sense of it any longer! Meanwhile, we have #9 above, and #11, below, but no #10. I guess the btrfs fi show is supposed to be #10. Or maybe #9 was supposed to be #10 and include both the lsblk and the btrfs fi show, and #9 was supposed to be the mount we're missing. Either way, more to not make any sense in a post that already made no sense. <shrug> > 11. umount /mnt/brick2 > 12. Reboot > 13. btrfs fi show > warning, device 1 is missing > warning devid 1 not found already > Label: 'second' uuid: [...]-7fc93285c29c > Total devices 2 FS bytes used 500.68GiB > devid 2 size 697.64GiB used 506.06GiB path /dev/sdc > *** Some devices missing OK, the -7fc UUID that was previously mounted on /mnt/brick2... And this is a btrfs fi show, without path, so it should list all btrfs in the system, mounted or not. No others shown. Whatever happened to the /mnt/brick filesystem umounted in #1, or the -828 UUID the dmesg at the top complaining about a missing device was complaining about? No clue. But there was no btrfs device scan done before that btrfs fi show. Maybe that's why. Or maybe it's because the other btrfs entries were manually edited out here. > 14. # mount -o degraded, /dev/sdc /mnt/brick2 > mount: wrong fs type, bad option, bad superblock on /dev/sdc > > and the trace at the very top with bogus missing devices(1) exceeds the > limit(0), writeable mount is not allowed. > > So during that not degraded mount of the file system where it saw a > ghost of devid1, it wrote single chunks to devid2. And now devid2 can > only ever be mounted read only. It's impossible to fix it, because I > can't add devices when ro mounted. The sequence still doesn't show where you actually did that mount that actually worked, only the one in #14 that didn't work, or what command you might have used. And the umount in #1 was apparently for an entirely different /mnt/brick, while the lsblk and btrfs fi show in #9 clearly shows /mnt/brick2, which if the sequence above is to be believed, remained mounted the entire time, including while you unplugged its devices, plugged them back in and ATA secure-erased one, then luksformatted it (tho you don't record the actual commands used so we don't know for sure you got the devices correct, particularly in light of your already mixing up brick and brick2), all while the btrfs on brick2 is still supposedly mounted, with a btrfs that we already know doesn't track device disappearance particularly well. In which case, I can see the still mounted btrfs trying to write raid1, and failing that, creating single chunks on the devices it could still see, to try to write to. But that's very much not the only thing mixed up here! Meanwhile, if your kernel is one without the per-chunk patches mentioned above, it could well be that the single chunks listed in that btrfs fi df are indeed there, intact, and that it didn't try to write to the other device at all. In fact, the presence of those single-mode chunks, indicate that it indeed *did* sense the missing other device at some point, and wrote single chunks instead of raid1 chunks as a result. With a kernel with those per-chunk tracking patches, it might well mount degraded,rw, and you may well have everything there, despite the entirely mixed up series of events above that make absolutely no sense as reported. > Does anyone have any idea what tool to use to explain how the devid1 > /dev/sdb, which has been securely erased, luks formatted, > disconnected, reconnected, *STILL* results in Btrfs thinking it's a > valid drive and allowing a non-degraded mount until there's a reboot? > That's really scary. > > It's like the btrfs kernel code isn't refreshing its own fs or dev > states when other parts of the kernel know it's gone. Maybe a 'btrfs dev > scan' would have cleared this up, but I shouldn't have to do that to > refresh Btrfs's state anytime I disconnect and connect devices just to > make sure it doesn't sabotage the devices by surreptitiously adding > single chunks to one of the drives! Based on the evidence, I'd guess that you actually mounted it degraded,rw, somewhere along the line, and it wrote those single-mode chunks at that point. Further, whatever kernel you're running, I'd guess it doesn't have the fairly recent patches checking data/metadata availability per- chunk, and thus is exhibiting the known pre-patch behavior of refusing a second degraded,rw mount when the first put some single chunks on the existing drive, despite the contents of those chunks and thus the entire filesystem, still being available. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-03 13:48 ` Duncan @ 2016-01-03 21:33 ` Chris Murphy 2016-01-05 14:50 ` Duncan 0 siblings, 1 reply; 9+ messages in thread From: Chris Murphy @ 2016-01-03 21:33 UTC (permalink / raw) To: Duncan; +Cc: Btrfs BTRFS kernel-4.4.0-0.rc6.git0.1.fc24.x86_64 btrfs-progs 4.3.1 There was some copy pasting, hence /mnt/brick vs /mnt/brick2 confusion, but the volume was always cleanly mounted and umounted. The biggest problem I have with all of this is the completely silent addition of single chunks. That made the volume, in effect, no longer completely raid1. No other details matter, except to try and reproduce the problem, and find its source so it can be fixed. It is a bug, because it's definitely not sane or expected behavior at all. Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-03 21:33 ` Chris Murphy @ 2016-01-05 14:50 ` Duncan 2016-01-05 21:47 ` Chris Murphy 0 siblings, 1 reply; 9+ messages in thread From: Duncan @ 2016-01-05 14:50 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Sun, 03 Jan 2016 14:33:40 -0700 as excerpted: > kernel-4.4.0-0.rc6.git0.1.fc24.x86_64 btrfs-progs 4.3.1 > > There was some copy pasting, hence /mnt/brick vs /mnt/brick2 confusion, > but the volume was always cleanly mounted and umounted. > > The biggest problem I have with all of this is the completely silent > addition of single chunks. That made the volume, in effect, no longer > completely raid1. No other details matter, except to try and reproduce > the problem, and find its source so it can be fixed. It is a bug, > because it's definitely not sane or expected behavior at all. If there's no way you mounted it degraded,rw at any point, I agree, single mode chunks are unexpected on a raid1 for both data and metadata, and it's a bug -- possibly actually related to that new code that allows degraded,rw recovery via per-chunk checks. If however you mounted it degraded,rw at some point, then I'd say the bug is in wetware, as in that case, based on my understanding, it's working as intended. I was inclined to believe that was what happened based on the obviously partial sequence in the earlier post, but if you say you didn't... then it's all down to duplication and finding why it's suddenly reverting to single mode on non-degraded mounts, which indeed /is/ a bug. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-05 14:50 ` Duncan @ 2016-01-05 21:47 ` Chris Murphy 2016-01-09 10:55 ` Duncan 0 siblings, 1 reply; 9+ messages in thread From: Chris Murphy @ 2016-01-05 21:47 UTC (permalink / raw) To: Duncan; +Cc: Btrfs BTRFS On Tue, Jan 5, 2016 at 7:50 AM, Duncan <1i5t5.duncan@cox.net> wrote: > > If however you mounted it degraded,rw at some point, then I'd say the bug > is in wetware, as in that case, based on my understanding, it's working > as intended. I was inclined to believe that was what happened based on > the obviously partial sequence in the earlier post, but if you say you > didn't... then it's all down to duplication and finding why it's suddenly > reverting to single mode on non-degraded mounts, which indeed /is/ a bug. Clearly I will have to retest. But even as rw,degraded, it doesn't matter, that'd still be a huge bug. There's no possible way you'll convince me this is a user misunderstanding. No where is this documented. I made the fs using mfks.btrfs -draid1 -mraid1. There is no way the fs, under any circumstance, legitimately creates and uses any other profile for any chunk type, ever. Let alone silently. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-05 21:47 ` Chris Murphy @ 2016-01-09 10:55 ` Duncan 2016-01-09 22:29 ` Chris Murphy 2016-01-10 16:54 ` Goffredo Baroncelli 0 siblings, 2 replies; 9+ messages in thread From: Duncan @ 2016-01-09 10:55 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Tue, 05 Jan 2016 14:47:52 -0700 as excerpted: > On Tue, Jan 5, 2016 at 7:50 AM, Duncan <1i5t5.duncan@cox.net> wrote: > > >> If however you mounted it degraded,rw at some point, then I'd say the >> bug is in wetware, as in that case, based on my understanding, it's >> working as intended. I was inclined to believe that was what happened >> based on the obviously partial sequence in the earlier post, but if you >> say you didn't... then it's all down to duplication and finding why >> it's suddenly reverting to single mode on non-degraded mounts, which >> indeed /is/ a bug. > > Clearly I will have to retest. > > But even as rw,degraded, it doesn't matter, that'd still be a huge bug. > There's no possible way you'll convince me this is a user > misunderstanding. No where is this documented. > > I made the fs using mfks.btrfs -draid1 -mraid1. There is no way the fs, > under any circumstance, legitimately creates and uses any other profile > for any chunk type, ever. Let alone silently. If you're mounting degraded,rw, and you're down to a single device on a raid1, then once the existing chunks fill up, it /has/ to create single chunks, because it can't create them raid1 as there's not enough devices (a minimum of two devices are required to create raid1 chunks, since two copies are required and they can't be on the same device). And by mounting degraded,rw you've given it permission to create those single mode chunks if it has to, so it's not "silent", as you've explicitly mounted it degraded,rw, and single is what raid1 degrades to when there's only one device. And with automatic empty-chunk deletion, existing chunks can fill up pretty fast... Further, NOT letting it write single chunks when an otherwise raid1 btrfs is mounted in degraded,rw mode, would very possibly prevent you from repairing the filesystem with a btrfs replace or btrfs device add and delete. And we've been there, done that, except slightly differently, with the can only mount degraded,rw until a single mode chunk is written, after which you can only mount degraded,ro, and then can't repair, which is the problem that the per-chunk check patches, vs. the old filesystem- scope check, were designed to eliminate. But as I said, if it's creating single chunks when you did /not/ have it mounted degraded, then you indeed have found a bug, and figuring out how to replicate it so it can be properly traced and fixed is where we're left, as I can't see how anyone would find that not a bug. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-09 10:55 ` Duncan @ 2016-01-09 22:29 ` Chris Murphy 2016-01-10 5:34 ` Duncan 2016-01-10 16:54 ` Goffredo Baroncelli 1 sibling, 1 reply; 9+ messages in thread From: Chris Murphy @ 2016-01-09 22:29 UTC (permalink / raw) To: Duncan; +Cc: Btrfs BTRFS On Sat, Jan 9, 2016 at 3:55 AM, Duncan <1i5t5.duncan@cox.net> wrote: > Chris Murphy posted on Tue, 05 Jan 2016 14:47:52 -0700 as excerpted: > >> On Tue, Jan 5, 2016 at 7:50 AM, Duncan <1i5t5.duncan@cox.net> wrote: >> >> >>> If however you mounted it degraded,rw at some point, then I'd say the >>> bug is in wetware, as in that case, based on my understanding, it's >>> working as intended. I was inclined to believe that was what happened >>> based on the obviously partial sequence in the earlier post, but if you >>> say you didn't... then it's all down to duplication and finding why >>> it's suddenly reverting to single mode on non-degraded mounts, which >>> indeed /is/ a bug. >> >> Clearly I will have to retest. >> >> But even as rw,degraded, it doesn't matter, that'd still be a huge bug. >> There's no possible way you'll convince me this is a user >> misunderstanding. No where is this documented. >> >> I made the fs using mfks.btrfs -draid1 -mraid1. There is no way the fs, >> under any circumstance, legitimately creates and uses any other profile >> for any chunk type, ever. Let alone silently. > > If you're mounting degraded,rw, and you're down to a single device on a > raid1, then once the existing chunks fill up, it /has/ to create single > chunks, because it can't create them raid1 as there's not enough devices > (a minimum of two devices are required to create raid1 chunks, since two > copies are required and they can't be on the same device). > > And by mounting degraded,rw you've given it permission to create those > single mode chunks if it has to, so it's not "silent", as you've > explicitly mounted it degraded,rw, and single is what raid1 degrades to > when there's only one device. This is esoteric for mortal users (let alone without documentation) that degraded,rw means single chunks will be made, and now new data is no longer replicated once the bad device is replaced and volume scrubbed. There's an incongruency between the promise of "fault tolerance, repair, and easy administration" and the esoteric reality. This is not easy, this is a gotcha. I'll bet almost no users have any idea this is how rw,degraded behaves and the risk it entails. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-09 22:29 ` Chris Murphy @ 2016-01-10 5:34 ` Duncan 0 siblings, 0 replies; 9+ messages in thread From: Duncan @ 2016-01-10 5:34 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Sat, 09 Jan 2016 15:29:31 -0700 as excerpted: > On Sat, Jan 9, 2016 at 3:55 AM, Duncan <1i5t5.duncan@cox.net> wrote: >> >> If you're mounting degraded,rw, and you're down to a single device on a >> raid1, then once the existing chunks fill up, it /has/ to create single >> chunks, because it can't create them raid1 as there's not enough >> devices (a minimum of two devices are required to create raid1 chunks, >> since two copies are required and they can't be on the same device). >> >> And by mounting degraded,rw you've given it permission to create those >> single mode chunks if it has to, so it's not "silent", as you've >> explicitly mounted it degraded,rw, and single is what raid1 degrades to >> when there's only one device. > > This is esoteric for mortal users (let alone without documentation) that > degraded,rw means single chunks will be made, and now new data is no > longer replicated once the bad device is replaced and volume scrubbed. > > There's an incongruency between the promise of "fault tolerance, repair, > and easy administration" and the esoteric reality. This is not easy, > this is a gotcha. I'll bet almost no users have any idea this is how > rw,degraded behaves and the risk it entails. Certainly, documentation is an issue. But while the degraded option doesn't force degraded, only allows it if there are missing devices, it's not recommended, and this is one reason why. Using the degraded option really /does/ give the filesystem permission to break the rules that would apply in normal operation, and adding to your mount options shouldn't be done lightly or routinely. Ideally, it's /only/ added after a device fails, in ordered to be able to mount the filesystem and replace the failing/failed device with a new one or reshape the filesystem to one less device if a new one isn't to be added. OTOH, if there are three devices in the raid1, and all three have unallocated space, then loss of a device shouldn't result in single-mode chunks even when mounting degraded, because it's still possible in that case to create raid1 chunks as there's still two devices with free space available. Again, creation of single chunks in that case would be a bug. But I think we're past the effective argument point and pretty much just restating our position at this point. Given that I'm definitely not a btrfs coder and to my knowledge, while you may well read the code and do occasional trivial patches, you're not really a btrfs coder either, alleviating that documentation issue, which we both agree is there, is the best either of us can really do. The rest remains with the real btrfs coders, and arguing further about it as non-btrfs-devs isn't going to help. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: evidence of persistent state, despite device disconnects 2016-01-09 10:55 ` Duncan 2016-01-09 22:29 ` Chris Murphy @ 2016-01-10 16:54 ` Goffredo Baroncelli 1 sibling, 0 replies; 9+ messages in thread From: Goffredo Baroncelli @ 2016-01-10 16:54 UTC (permalink / raw) To: Duncan, linux-btrfs On 2016-01-09 11:55, Duncan wrote: > (a minimum of two devices are required to create raid1 chunks, since two > copies are required and they can't be on the same device). I think that this is the problem: BTRFS should allocate a new chunk as RAID1, even if only one device is available. It is already capable to use a RAID1 chunk in degraded mode, so it shouldn't be so difficult to create new chunk RAID1 when only a one disk is available. Anyway I agree with Chris about the fact that btrfs sometime gives incorrect information about the devices. In the past I proposed to abandon the current model where the device are "per-registered" before the mount command asynchronously. I wrote a mount helper which does a scan at the mount time [1]; this would reduce the window time where a device disappearing could cause confusion. BR G.Baroncelli [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg39429.html -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-01-10 16:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-02 19:22 evidence of persistent state, despite device disconnects Chris Murphy 2016-01-03 13:48 ` Duncan 2016-01-03 21:33 ` Chris Murphy 2016-01-05 14:50 ` Duncan 2016-01-05 21:47 ` Chris Murphy 2016-01-09 10:55 ` Duncan 2016-01-09 22:29 ` Chris Murphy 2016-01-10 5:34 ` Duncan 2016-01-10 16:54 ` Goffredo Baroncelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).