* some free space cache corruptions @ 2016-12-25 22:00 Christoph Anton Mitterer 2016-12-26 0:12 ` Duncan 0 siblings, 1 reply; 5+ messages in thread From: Christoph Anton Mitterer @ 2016-12-25 22:00 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3667 bytes --] Hey. Had the following on a Debian sid: Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux btrfs-progs v4.7.3 I was doing a btrfs check of a rather big btrfs (8TB device, nearly full), having many snapshots on it, all incrementally send from another 8TB device, which in turn functions as the master copy: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 6805741969408 bytes used err is 0 total csum bytes: 6634558200 total tree bytes: 10292641792 total fs tree bytes: 2074869760 total extent tree bytes: 1100251136 btree space waste bytes: 885346193 file data blocks allocated: 6922343247872 referenced 7040929374208 0 => this already showed an unusual: cache and super generation don't match, space cache will be invalidated Where does it come from? Then I did some incremental send/receive (-p) from the other master 8TB master btrfs and another fsck afters wards: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache checking fs roots checking csums checking root refs found 7467006156800 bytes used err is 0 total csum bytes: 7279407560 total tree bytes: 11069603840 total fs tree bytes: 2127314944 total extent tree bytes: 1141342208 btree space waste bytes: 922662895 file data blocks allocated: 7599280926720 referenced 7720960733184 0 => all fine... Afterwards I removed all ro-snapshots except the most recent one... and repeated the fsck: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7427361222656 bytes used err is 0 total csum bytes: 7240763996 total tree bytes: 10998038528 total fs tree bytes: 2100297728 total extent tree bytes: 1137065984 btree space waste bytes: 992708933 file data blocks allocated: 7416363184128 referenced 7536754290688 0 => Isn't that some indication of a bug already? Nothing happened, just deletion of snapshots and there is apparently some free space cache corruption? Then I tried the usual recipe: mount /data/data-a/2/ -o clear_cache kernel said: Dec 25 22:14:17 heisenberg kernel: BTRFS info (device dm-2): force clearing of disk cache ...re-mounted,rw, deleted some regular files... repeated the fsck and again: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7427284213760 bytes used err is 0 total csum bytes: 7240689688 total tree bytes: 10997907456 total fs tree bytes: 2100281344 total extent tree bytes: 1137049600 btree space waste bytes: 992679805 file data blocks allocated: 7416286306304 referenced 7536677412864 0 => same error again... Any ideas how to resolve? And is this some serious error that could have caused corruptions? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5930 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some free space cache corruptions 2016-12-25 22:00 some free space cache corruptions Christoph Anton Mitterer @ 2016-12-26 0:12 ` Duncan 2016-12-26 7:06 ` Janos Toth F. 2016-12-29 3:43 ` Christoph Anton Mitterer 0 siblings, 2 replies; 5+ messages in thread From: Duncan @ 2016-12-26 0:12 UTC (permalink / raw) To: linux-btrfs Christoph Anton Mitterer posted on Sun, 25 Dec 2016 23:00:34 +0100 as excerpted: > # btrfs check /dev/mapper/data-a2 ; echo $? > Checking filesystem on /dev/mapper/data-a2 [...] > checking free space cache > block group 5431552376832 has wrong amount of free space > failed to load free space cache for block group 5431552376832 [...] > 0 > > => same error again... > > Any ideas how to resolve? And is this some serious error that could have > caused corruptions? By themselves, free-space cache warnings are minor and not a serious issue at all -- the cache is just that, a cache, designed to speed operation but not actually necessary, and btrfs can detect and route around space-cache corruption on-the-fly so by itself it's not a big deal. These warnings are however hints that something out of the routine has happened, and that you might wish to freshen your backups, run btrfs check and scrub and see if anything else is wrong (if you get them at mount, you got them /running/ btrfs check and nothing else out of the ordinary was reported), etc. Three things to note: 1) Plain btrfs check, without options that trigger fixes, is read-only, so you are likely to see anything unusual it reports again in repeated runs, unless the filesystem itself, or a scrub, etc, has fixed things in the mean time. (And as I said, the space-cache is only a cache, designed to speed things up, cache corruption is fairly common and btrfs can and does deal with it without issue. In fact btrfs has the nospace_cache option to entirely disable it at mount.) 2) It recently came to the attention of the devs that the existing btrfs mount-option method of clearing the free-space cache only clears it for block-groups/chunks it encounters on-the-fly. It doesn't do a systematic beginning-to-end clear. As such, in some instances it's possible to run with the clear_cache mount option (see the btrfs (5) manpage for mount option specifics, but to head off another question, it's recommended to stay with v1 cache for now) and still see space-cache warnings you thought should be cleared up, if btrfs didn't deal with those chunks in the run where the cache was cleared. 3) As a result of #2, the devs only very recently added support in btrfs check for a /full/ space-cache-v1 clear, using the new --clear-space-cache option. But your btrfs-progs v4.7.3 is too old to support it. I know it's in the v4.9 I just upgraded to... checking the wiki it appears the option was added in btrfs-progs v4.8.3 (v4.8.4 for v2 cache). So if you want you can try the clear_cache mount option, and if that doesn't do it, upgrade to a current btrfs-progs and run it with the --clear-space-cache option, but you're not endangering your filesystem or anything by simply waiting until you get a btrfs-progs update from your distro, if you decide to. The space-cache warnings aren't indicative of a serious problem now and btrfs deals with them on its own, they are simply hints that something, perhaps a crash with the btrfs mounted writable, happened at some time in the past, and that it it might be wise to investigate further for other damage, which you've already done, so you're good. =:^) Tho if you haven't recently run a scrub, I'd do that as well (and in fact recommend running it before check if you can successfully mount), since the problems it detects and fixes are conceptually different than the ones btrfs check deals with. Scrub deals with actual on-media corruption, blocks not matching their checksum, while check deals with filesystem logic errors, whether or not the blocks containing them match the checksum. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some free space cache corruptions 2016-12-26 0:12 ` Duncan @ 2016-12-26 7:06 ` Janos Toth F. 2016-12-29 3:43 ` Christoph Anton Mitterer 1 sibling, 0 replies; 5+ messages in thread From: Janos Toth F. @ 2016-12-26 7:06 UTC (permalink / raw) To: Btrfs BTRFS I am not sure I can remember a time when btrfs check did not print this "cache and super generation don't match, space cache will be invalidated" message, so I started ignoring it a long time ago because I never seemed to have problem with missing free space and never got any similar warnings/errors in the kernel log. On Mon, Dec 26, 2016 at 1:12 AM, Duncan <1i5t5.duncan@cox.net> wrote: > Christoph Anton Mitterer posted on Sun, 25 Dec 2016 23:00:34 +0100 as > excerpted: > >> # btrfs check /dev/mapper/data-a2 ; echo $? >> Checking filesystem on /dev/mapper/data-a2 > > [...] > >> checking free space cache >> block group 5431552376832 has wrong amount of free space >> failed to load free space cache for block group 5431552376832 > > [...] > >> 0 >> >> => same error again... >> >> Any ideas how to resolve? And is this some serious error that could have >> caused corruptions? > > By themselves, free-space cache warnings are minor and not a serious > issue at all -- the cache is just that, a cache, designed to speed > operation but not actually necessary, and btrfs can detect and route > around space-cache corruption on-the-fly so by itself it's not a big deal. > > These warnings are however hints that something out of the routine has > happened, and that you might wish to freshen your backups, run btrfs > check and scrub and see if anything else is wrong (if you get them at > mount, you got them /running/ btrfs check and nothing else out of the > ordinary was reported), etc. > > Three things to note: > > 1) Plain btrfs check, without options that trigger fixes, is read-only, > so you are likely to see anything unusual it reports again in repeated > runs, unless the filesystem itself, or a scrub, etc, has fixed things in > the mean time. (And as I said, the space-cache is only a cache, designed > to speed things up, cache corruption is fairly common and btrfs can and > does deal with it without issue. In fact btrfs has the nospace_cache > option to entirely disable it at mount.) > > 2) It recently came to the attention of the devs that the existing btrfs > mount-option method of clearing the free-space cache only clears it for > block-groups/chunks it encounters on-the-fly. It doesn't do a systematic > beginning-to-end clear. As such, in some instances it's possible to run > with the clear_cache mount option (see the btrfs (5) manpage for mount > option specifics, but to head off another question, it's recommended to > stay with v1 cache for now) and still see space-cache warnings you > thought should be cleared up, if btrfs didn't deal with those chunks in > the run where the cache was cleared. > > 3) As a result of #2, the devs only very recently added support in btrfs > check for a /full/ space-cache-v1 clear, using the new > --clear-space-cache option. But your btrfs-progs v4.7.3 is too old to > support it. I know it's in the v4.9 I just upgraded to... checking the > wiki it appears the option was added in btrfs-progs v4.8.3 (v4.8.4 for v2 > cache). > > So if you want you can try the clear_cache mount option, and if that > doesn't do it, upgrade to a current btrfs-progs and run it with the > --clear-space-cache option, but you're not endangering your filesystem or > anything by simply waiting until you get a btrfs-progs update from your > distro, if you decide to. The space-cache warnings aren't indicative of > a serious problem now and btrfs deals with them on its own, they are > simply hints that something, perhaps a crash with the btrfs mounted > writable, happened at some time in the past, and that it it might be wise > to investigate further for other damage, which you've already done, so > you're good. =:^) > > Tho if you haven't recently run a scrub, I'd do that as well (and in fact > recommend running it before check if you can successfully mount), since > the problems it detects and fixes are conceptually different than the > ones btrfs check deals with. Scrub deals with actual on-media > corruption, blocks not matching their checksum, while check deals with > filesystem logic errors, whether or not the blocks containing them match > the checksum. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some free space cache corruptions 2016-12-26 0:12 ` Duncan 2016-12-26 7:06 ` Janos Toth F. @ 2016-12-29 3:43 ` Christoph Anton Mitterer 2016-12-29 6:55 ` Duncan 1 sibling, 1 reply; 5+ messages in thread From: Christoph Anton Mitterer @ 2016-12-29 3:43 UTC (permalink / raw) To: Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3434 bytes --] On Mon, 2016-12-26 at 00:12 +0000, Duncan wrote: > By themselves, free-space cache warnings are minor and not a serious > issue at all -- the cache is just that, a cache, designed to speed > operation but not actually necessary, and btrfs can detect and route > around space-cache corruption on-the-fly so by itself it's not a big > deal. Well... sure about that? Haven't we had recently that serious bug in the FST, which could cause data corruption as btrfs used space as free, while it wasn't? > These warnings are however hints that something out of the routine > has > happened Which again just likely means that there was/is some bug in btrfs... other than that, why should it suddenly get some corrupted cache, when only ro-snapshots were removed in bewtween? > unless the filesystem itself, or a scrub, etc, has fixed things > in > the mean time. (And as I said, the space-cache is only a cache, > designed > to speed things up, cache corruption is fairly common and btrfs can > and > does deal with it without issue. When finishing the most recent backups, the fs in question got pretty fully and the error message I've spottet during btrfs check appeared in the kernel log as well: Dec 29 03:03:11 heisenberg kernel: BTRFS warning (device dm-1): block group 5431552376832 has wrong amount of free space Dec 29 03:03:11 heisenberg kernel: BTRFS warning (device dm-1): failed to load free space cache for block group 5431552376832, rebuilding it now (fs was NOT mounted with clear_cache) which implies it was now rebuilt However, after a subsquent fsck, the same error occurs there again: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7571911602176 bytes used err is 0 total csum bytes: 7381752972 total tree bytes: 11145035776 total fs tree bytes: 2100396032 total extent tree bytes: 1137082368 btree space waste bytes: 996179488 file data blocks allocated: 7560766566400 referenced 7681157672960 0 > 2) It recently came to the attention of the devs that the existing > btrfs > mount-option method of clearing the free-space cache only clears it > for > block-groups/chunks it encounters on-the-fly. It doesn't do a > systematic > beginning-to-end clear. So that calls for fixing the documentation as well?! > 3) As a result of #2, the devs only very recently added support in > btrfs > check for a /full/ space-cache-v1 clear, using the new > --clear-space-cache option. But your btrfs-progs v4.7.3 is too old > to > support it. I know it's in the v4.9 I just upgraded to... checking > the > wiki it appears the option was added in btrfs-progs v4.8.3 (v4.8.4 > for v2 > cache). And is the new option stable?! ;-) > Tho if you haven't recently run a scrub, I'd do that as well Well I did a full verification using my own checksum (i.e. every regular file in the fs has SHA512 sums attached as XATTR)... since that caused all data to be read, this should be identical to a scrub (at least as for the regular files data (no necessarily metadata), shouldn't it? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5930 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some free space cache corruptions 2016-12-29 3:43 ` Christoph Anton Mitterer @ 2016-12-29 6:55 ` Duncan 0 siblings, 0 replies; 5+ messages in thread From: Duncan @ 2016-12-29 6:55 UTC (permalink / raw) To: linux-btrfs Christoph Anton Mitterer posted on Thu, 29 Dec 2016 04:43:35 +0100 as excerpted: > On Mon, 2016-12-26 at 00:12 +0000, Duncan wrote: >> By themselves, free-space cache warnings are minor and not a serious >> issue at all -- the cache is just that, a cache, designed to speed >> operation but not actually necessary, and btrfs can detect and route >> around space-cache corruption on-the-fly so by itself it's not a big >> deal. > Well... sure about that? Haven't we had recently that serious bug in the > FST, which could cause data corruption as btrfs used space as free, > while it wasn't? Well, the free-space-tree (FST) itself remains experimental and not recommended for general use yet. The btrfs (5) manpage (as of -progs-4.9 at least) calls space_cache=v1 the safe default, and the wiki status page lists v2 (tree) as orange level (/mostly/ OK). And note that I said free-space _cache_, not free-space _tree_. Of course that's not to (unwisely) claim there are no bugs in the free- space _cache_ (aka v1), but rather, to claim that its status is exactly the same as that of btrfs in general, stabilizing but not fully stable, workable in general for daily use as long as you keep your backups updated and ready to use, and stay away from the known to be less stable features... which do /not/ include the free-space cache (v1), but /do/ include the free-space tree (v2). And that cache (as opposed to tree) functionality really /is/ quite stable, as it has been rather heavily tested by now. The only exception would be the usual one for new code over old, where the new code hasn't been well tested, but that's a given for projects at this stage, so has little need to be explicitly stated. > >> These warnings are however hints that something out of the routine has >> happened > Which again just likely means that there was/is some bug in btrfs... > other than that, why should it suddenly get some corrupted cache, when > only ro-snapshots were removed in bewtween? That wasn't plain to me in the message I replied to. What I had in mind with that out of the routine reference was an ungraceful shutdown or crash, which /does/ commonly leave the free-space-cache in an inconsistent state, that btrfs routinely detects and deals with, invalidating and not using the section of cache that doesn't match what it knows to be the case from the other trees. And in such an ungraceful shutdown situation, exactly as I stated, the free-space-cache warning is expected and dealt with routinely, but it's a hint that something else might have gone wrong in the event as well, that isn't necessarily so easily fixed, and that very well may /not/ be fixed automatically, and further, that continuing to use the filesystem with that problem still lurking can potentially cause further damage. >> 2) It recently came to the attention of the devs that the existing >> btrfs mount-option method of clearing the free-space cache only clears >> it for block-groups/chunks it encounters on-the-fly. It doesn't do a >> systematic beginning-to-end clear. > So that calls for fixing the documentation as well?! It's documented already (in -progs 4.9) in the btrfs-check manpage, but you are correct in that it's not documented in the btrfs (5) manpage, which covers the mount options themselves. On the wiki the manpages apparently haven't been regenerated from git recently, so they're missing the 4.9 content mentioned above, unless you follow the link in the warning at the top of each one, to the git version. The git version of the manpages appears to have the same status as the 4.9 manpages, given above. Of course if people are following this list as recommended, they'll know about it as well, because they will have seen the recent discussion. Tho of course that's not going to help people who will be starting to investigate btrfs in some weeks' time, unless they read the list archive back far enough to see the discussion. So it definitely needs documented in the btrfs (5) manpage ASAP, with the wiki manpage versions regenerated after it hits git. >> 3) As a result of #2, the devs only very recently added support in >> btrfs check for a /full/ space-cache-v1 clear, using the new >> --clear-space-cache option. But your btrfs-progs v4.7.3 is too old to >> support it. I know it's in the v4.9 I just upgraded to... checking the >> wiki it appears the option was added in btrfs-progs v4.8.3 (v4.8.4 for >> v2 cache). > > And is the new option stable?! ;-) The btrfs check option should be reasonably stable, yes. Because it's a full clear on an unmounted filesystem, which has far less ways to go wrong than attempting to do a partial clear on a mounted filesystem. Additionally, it has been there since 4.8.3, so thru that, 4.8.4, 4.8.5, and now into 4.9.0, without noted problems. So it should be reasonably stable. Put it this way, unlike most of the non-read-only options in btrfs check, I'd be quite willing to use it on my own systems without worry about risking further damage, should it be necessary. And I tend to be pretty cautious about using known-unstable or stability-questionable features and options. Of course there's always the chance that some bug might cause it to go wildly wrong, but that's _precisely_ why nobody here seriously claims that btrfs is fully stable and mature yet, and why keeping up with and being willing to use backups should it become necessary is so strongly recommended. Given those parameters, I'd not hesitate at all to use the btrfs check --clear-space-cache option on my own systems or recommend its use to others, because I believe the risk of that specific option to be no more than, and arguably relatively less than, that I'm already taking by choosing to run btrfs in the first place. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-29 6:55 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-25 22:00 some free space cache corruptions Christoph Anton Mitterer 2016-12-26 0:12 ` Duncan 2016-12-26 7:06 ` Janos Toth F. 2016-12-29 3:43 ` Christoph Anton Mitterer 2016-12-29 6:55 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox