* strange corruptions found during btrfs check @ 2015-07-02 16:12 Christoph Anton Mitterer 2015-07-06 18:40 ` Christoph Anton Mitterer 0 siblings, 1 reply; 5+ messages in thread From: Christoph Anton Mitterer @ 2015-07-02 16:12 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 9417 bytes --] Hi. This is on a btrfs created and used with a 4.0 kernel. Not much was done on it, apart from send/receive snapshots from another btrfs (with -p). Some of the older snapshots (that were used as parents before) have been removed in the meantime). Now a btrfs check gives this: # btrfs check /dev/mapper/image Checking filesystem on /dev/mapper/data-b UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f checking extents ref mismatch on [468697088 16384] extent item 0, found 1 Backref 468697088 parent 4159 root 4159 not found in extent tree backpointer mismatch on [468697088 16384] owner ref check failed [468697088 16384] ref mismatch on [1002373120 16384] extent item 0, found 1 Backref 1002373120 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1002373120 16384] ref mismatch on [1013940224 16384] extent item 0, found 1 Backref 1013940224 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1013940224 16384] ref mismatch on [525281738752 16384] extent item 0, found 1 Backref 525281738752 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525281738752 16384] owner ref check failed [525281738752 16384] ref mismatch on [525317095424 16384] extent item 0, found 1 Backref 525317095424 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525317095424 16384] owner ref check failed [525317095424 16384] ref mismatch on [525404700672 16384] extent item 0, found 1 Backref 525404700672 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525404700672 16384] owner ref check failed [525404700672 16384] ref mismatch on [525438025728 16384] extent item 0, found 1 Backref 525438025728 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525438025728 16384] owner ref check failed [525438025728 16384] ref mismatch on [525554302976 16384] extent item 0, found 1 Backref 525554302976 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525554302976 16384] owner ref check failed [525554302976 16384] ref mismatch on [525585235968 16384] extent item 0, found 1 Backref 525585235968 parent 4159 root 4159 not found in extent tree backpointer mismatch on [525585235968 16384] owner ref check failed [525585235968 16384] ref mismatch on [830810521600 16384] extent item 0, found 1 Backref 830810521600 parent 4159 root 4159 not found in extent tree backpointer mismatch on [830810521600 16384] owner ref check failed [830810521600 16384] ref mismatch on [830895620096 16384] extent item 0, found 1 Backref 830895620096 parent 4159 root 4159 not found in extent tree backpointer mismatch on [830895620096 16384] owner ref check failed [830895620096 16384] ref mismatch on [1038383448064 16384] extent item 0, found 1 Backref 1038383448064 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1038383448064 16384] owner ref check failed [1038383448064 16384] ref mismatch on [1391733161984 16384] extent item 0, found 1 Backref 1391733161984 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1391733161984 16384] ref mismatch on [1392008445952 16384] extent item 0, found 1 Backref 1392008445952 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1392008445952 16384] ref mismatch on [1392058843136 16384] extent item 0, found 1 Backref 1392058843136 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1392058843136 16384] ref mismatch on [1392058925056 16384] extent item 0, found 1 Backref 1392058925056 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1392058925056 16384] ref mismatch on [1466625753088 16384] extent item 0, found 1 Backref 1466625753088 parent 4159 root 4159 not found in extent tree backpointer mismatch on [1466625753088 16384] owner ref check failed [1466625753088 16384] ref mismatch on [2857092792320 16384] extent item 0, found 1 Backref 2857092792320 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857092792320 16384] owner ref check failed [2857092792320 16384] ref mismatch on [2857095610368 16384] extent item 0, found 1 Backref 2857095610368 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857095610368 16384] owner ref check failed [2857095610368 16384] ref mismatch on [2857125183488 16384] extent item 0, found 1 Backref 2857125183488 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857125183488 16384] owner ref check failed [2857125183488 16384] ref mismatch on [2857127591936 16384] extent item 0, found 1 Backref 2857127591936 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857127591936 16384] owner ref check failed [2857127591936 16384] ref mismatch on [2857130393600 16384] extent item 0, found 1 Backref 2857130393600 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857130393600 16384] owner ref check failed [2857130393600 16384] ref mismatch on [2857138421760 16384] extent item 0, found 1 Backref 2857138421760 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857138421760 16384] owner ref check failed [2857138421760 16384] ref mismatch on [2857140436992 16384] extent item 0, found 1 Backref 2857140436992 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857140436992 16384] owner ref check failed [2857140436992 16384] ref mismatch on [2857153970176 16384] extent item 0, found 1 Backref 2857153970176 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857153970176 16384] owner ref check failed [2857153970176 16384] ref mismatch on [2857155837952 16384] extent item 0, found 1 Backref 2857155837952 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857155837952 16384] owner ref check failed [2857155837952 16384] ref mismatch on [2857157509120 16384] extent item 0, found 1 Backref 2857157509120 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857157509120 16384] owner ref check failed [2857157509120 16384] ref mismatch on [2857157836800 16384] extent item 0, found 1 Backref 2857157836800 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857157836800 16384] owner ref check failed [2857157836800 16384] ref mismatch on [2857160605696 16384] extent item 0, found 1 Backref 2857160605696 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857160605696 16384] owner ref check failed [2857160605696 16384] ref mismatch on [2857164636160 16384] extent item 0, found 1 Backref 2857164636160 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857164636160 16384] owner ref check failed [2857164636160 16384] ref mismatch on [2857167716352 16384] extent item 0, found 1 Backref 2857167716352 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857167716352 16384] owner ref check failed [2857167716352 16384] ref mismatch on [2857168977920 16384] extent item 0, found 1 Backref 2857168977920 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857168977920 16384] owner ref check failed [2857168977920 16384] ref mismatch on [2857175269376 16384] extent item 0, found 1 Backref 2857175269376 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857175269376 16384] owner ref check failed [2857175269376 16384] ref mismatch on [2857239904256 16384] extent item 0, found 1 Backref 2857239904256 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857239904256 16384] owner ref check failed [2857239904256 16384] ref mismatch on [2857241640960 16384] extent item 0, found 1 Backref 2857241640960 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857241640960 16384] owner ref check failed [2857241640960 16384] ref mismatch on [2857244409856 16384] extent item 0, found 1 Backref 2857244409856 parent 4159 root 4159 not found in extent tree backpointer mismatch on [2857244409856 16384] owner ref check failed [2857244409856 16384] ref mismatch on [3955758170112 16384] extent item 0, found 1 Backref 3955758170112 parent 4159 root 4159 not found in extent tree backpointer mismatch on [3955758170112 16384] ref mismatch on [3955758563328 16384] extent item 0, found 1 Backref 3955758563328 parent 4159 root 4159 not found in extent tree backpointer mismatch on [3955758563328 16384] owner ref check failed [3955758563328 16384] ref mismatch on [3955758678016 16384] extent item 0, found 1 Backref 3955758678016 parent 4159 root 4159 not found in extent tree backpointer mismatch on [3955758678016 16384] ref mismatch on [4420194156544 16384] extent item 0, found 1 Backref 4420194156544 parent 4159 root 4159 not found in extent tree backpointer mismatch on [4420194156544 16384] Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots checking csums checking root refs found 4945849393671 bytes used err is 0 total csum bytes: 4821697620 total tree bytes: 8430518272 total fs tree bytes: 2181136384 total extent tree bytes: 740294656 btree space waste bytes: 955676550 file data blocks allocated: 7672438157312 referenced 7740136660992 btrfs-progs v4.0 Any ideas? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: strange corruptions found during btrfs check 2015-07-02 16:12 strange corruptions found during btrfs check Christoph Anton Mitterer @ 2015-07-06 18:40 ` Christoph Anton Mitterer 2015-07-07 0:47 ` Duncan 0 siblings, 1 reply; 5+ messages in thread From: Christoph Anton Mitterer @ 2015-07-06 18:40 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 300 bytes --] After removing some of the snapshots that were received, the errors at btrfs check went away. Is there some list of features in btrfs which are considered stable? Cause I though send/receive and the subvolumes would be, but apparently this doesn't seem to be the case :-/ Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: strange corruptions found during btrfs check 2015-07-06 18:40 ` Christoph Anton Mitterer @ 2015-07-07 0:47 ` Duncan 2015-07-07 1:03 ` Christoph Anton Mitterer 0 siblings, 1 reply; 5+ messages in thread From: Duncan @ 2015-07-07 0:47 UTC (permalink / raw) To: linux-btrfs Christoph Anton Mitterer posted on Mon, 06 Jul 2015 20:40:23 +0200 as excerpted: > After removing some of the snapshots that were received, the errors at > btrfs check went away. > > Is there some list of features in btrfs which are considered stable? > Cause I though send/receive and the subvolumes would be, but apparently > this doesn't seem to be the case :-/ [List-regular non-developer but btrfs using admin answer.] I know of no such list, per se. There are, however, features that are known to be still being very actively worked on, either because they are very new to nominal code-completion (raid56 mode), or because they are simply complicated problems, possibly having to be redone with a new approach as the devs learned more about the the issues with the existing approach. This list would include: raid56 mode (new) quotas (on I think their second partial rewrite, third approach, now) send/receive (there's simply very many very complex corner-cases to find and deal with) Subvolumes/snapshots should however be reasonably stable, since their basis is pretty close to that of btrfs itself, b-trees and COW, and the hooks for managing them (the GUI) have been established for some time. The problems involving subvolumes/snapshots aren't so much in that subsystem, but in whatever other subsystems are involved as well. The interaction between quotas and subvolumes has been a problem point, for instance, and snapshot-aware-defrag continues to be disabled ATM as it simply didn't scale due to problems in other areas (quotas being one of them). The interaction between send/receive and subvolumes/snapshots is also a problem, but again, not so much on the subvolume/snapshot side, as on the send/receive side. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: strange corruptions found during btrfs check 2015-07-07 0:47 ` Duncan @ 2015-07-07 1:03 ` Christoph Anton Mitterer 2015-07-07 2:08 ` Duncan 0 siblings, 1 reply; 5+ messages in thread From: Christoph Anton Mitterer @ 2015-07-07 1:03 UTC (permalink / raw) To: Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1612 bytes --] On Tue, 2015-07-07 at 00:47 +0000, Duncan wrote: > The interaction between send/receive and subvolumes/snapshots > is also a problem, but again, not so much on the subvolume/snapshot > side, as on the send/receive side. Well I haven't looked into any code, so the following is just perception: It seemed that send/receive itself has always worked correctly for me so far. I.e. I ran some complete diff -qr over the source and target of an already incrementally (-p) sent/received snapshot. That brought no error. The aforementioned btrfs check errors only occurred after I had removed older snapshots on the receiving side, i.e. snapshots that btrfs, via the -p <same-old-snapshot-on-the-send-side>, used for building together the more recent snapshot. The error messages seem to imply that some of that got lost,... or at least that would be my first wild guess... as if refs in the newer snapshot on the receiving side point into the void, as the older snapshot's objects, they were pointing to, have been removed (or some of them lost). Apart from that, I think it's quite an issue that the core developers don't keep some well maintained list of working/experimental features... that's nearly as problematic as the complete lack of good and extensive end user (i.e. sysadmin) documentation. btrfs is quite long around now, and people start using it... but when they cannot really tell what's stable and what's not (respectively which parts of e.g. raid56 still need polishing) and they then stumble over problems, trust into btrfs is easily lost. :( Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: strange corruptions found during btrfs check 2015-07-07 1:03 ` Christoph Anton Mitterer @ 2015-07-07 2:08 ` Duncan 0 siblings, 0 replies; 5+ messages in thread From: Duncan @ 2015-07-07 2:08 UTC (permalink / raw) To: linux-btrfs Christoph Anton Mitterer posted on Tue, 07 Jul 2015 03:03:25 +0200 as excerpted: > Well I haven't looked into any code, so the following is just > perception: It seemed that send/receive itself has always worked > correctly for me so far. > I.e. I ran some complete diff -qr over the source and target of an > already incrementally (-p) sent/received snapshot. > That brought no error. In general, the send/receive corner-cases are of the type where both the send and the receive complete successfully, it should be reliable, but sometimes it won't complete successfully. > The aforementioned btrfs check errors only occurred after I had removed > older snapshots on the receiving side, i.e. snapshots that btrfs, via > the -p <same-old-snapshot-on-the-send-side>, used for building together > the more recent snapshot. > > The error messages seem to imply that some of that got lost,... or at > least that would be my first wild guess... as if refs in the newer > snapshot on the receiving side point into the void, as the older > snapshot's objects, they were pointing to, have been removed (or some of > them lost). That would imply either a general btrfs bug (see stability discussion below) or perhaps a below-filesystem error, that happened to be exposed by the snapshot deletion. It does look like a snapshot subsystem error, agreed, and conceivably could even be one at some level. However, the point I sort of made, but not well, in the previous reply, was that the snapshot and subvolume subsystem is so reliant on the core assumptions that btrfs itself makes about copy-on-write, etc, that the two cores really can't be easily separated, such that if deletion of a particular snapshot actually deletes extents pointed to by another snapshot, it's not a problem with the subvolume/snapshot system so much, as with btrfs itself. What /might/ be happening is that an extent usage reference count was somehow too low, such that when the snapshot was removed, the reference count decremented to zero and btrfs thus thought it safe to remove the actual data extents as well. However, shared-extents are actually a core feature of btrfs itself, relied upon not just by snapshot/subvolumes, but for instance used with cp --reflink=always when both instances of the file are on the same subvolume. So while such a reference count bug could certainly trigger with snapshot deletion, it wouldn't be a snapshot subsystem bug, but rather, a bug in core btrfs itself. The snapshot/subvolume subsystem, then, should be as stable as btrfs itself is, the point I made in my original reply, but again, more on that below. > Apart from that, I think it's quite an issue that the core developers > don't keep some well maintained list of working/experimental features... > that's nearly as problematic as the complete lack of good and extensive > end user (i.e. sysadmin) documentation. > btrfs is quite long around now, and people start using it... but when > they cannot really tell what's stable and what's not (respectively which > parts of e.g. raid56 still need polishing) and they then stumble over > problems, trust into btrfs is easily lost. :( Actually, that's a bit of a sore spot... Various warnings, in mkfs.btrfs, in the kernel config help text for btrfs, etc, about btrfs being experimental, are indeed being removed, tho some of us think it may be a bit premature. And various distros are now shipping btrfs as the default for one or more of their default partitions. OpenSuSE is for example shipping with btrfs for the system partition, to enable update rollbacks via btrfs snapshotting, among other things. But, btrfs itself remains under very heavy development. As I've expanded upon in previous posts, due to the dangers of premature optimization, perhaps one of the most direct measures of when _developers_ consider something stable, is whether they've done production-level optimizations in areas where pre-production code may well change, since if they optimize and then it does change, they lose those optimizations and must recode them. As an example, one reasonably well known optimization point in btrfs is the raid1-mode read-mode device scheduler. Btrfs' current scheduler implementation is very simple and very easy to test; it simply chooses the first or second copy of the data based on even/odd PID. That works well enough as an initial scheduler, being very simple to implement, ensuring both copies of the data get read over time, and being easy to test, since selectably loading either side or both sides is as easy as even/odd PID for the read test. But for a single-read-task on an otherwise idle system, it's horrible, 50% of best-case throughput. And if your use-case happens to spawn multiple work threads such that they're all even-PID or all odd-PID, one device is saturated, while the other sits entirely idle! Simple and easily understood case of obviously not yet production optimized! But kernel code already exists for a much better scheduler, one generally agreed to be very well optimized, that used by mdraid for its raid1 mode. So a well tested much better optimized solution is known and actually in use elsewhere in the kernel. Which pretty well demonstrates that the developers /themselves/ don't consider btrfs stable enough yet to do that sort of optimization. Were they to do so and the raid1 implementation to change, they'd have to redo that optimization, so they haven't done it yet, despite distributions already defaulting to btrfs and people already using it as if it were stable and production-ready. Really, the best that can be said is that btrfs isn't yet completely stable, despite distros already shipping it by default. However, that isn't such a bad problem, as it's stable /enough/ for good admin use, where a good admin by definition follows the admin's rule of backups, that being that if the data isn't backed up, by definition, it's of less value than the time and resources required to do that backup, despite any claims to the contrary. And of course the corollary, for purposes of the above rule, a would-be backup that hasn't been tested restorable isn't yet a backup, because a backup isn't complete until it has been tested. Because btrfs isn't yet entirely stable, that rule applies double. If you don't have backups, you /might/ lose the data, so best be prepared for it. But with that in mind, btrfs is stable /enough/. Many people use and depend on it to function normally in their daily routine, and btrfs is stable enough to do just that, as long as backups are available for valuable data, to cover the /non-/routine case. As for end-user/admin documentation, there is a reasonable amount, available on the btrfs wiki (https://btrfs.wiki.kernel.org), as well as all the articles in the Linux press that have covered btrfs over the years. Arguably, it's at an appropriate level for the state of btrfs itself, again, that being "not yet fully stable, but stabilizing". Between that and the btrfs list for questions not covered well enough on the wiki... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-07-07 2:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-02 16:12 strange corruptions found during btrfs check Christoph Anton Mitterer 2015-07-06 18:40 ` Christoph Anton Mitterer 2015-07-07 0:47 ` Duncan 2015-07-07 1:03 ` Christoph Anton Mitterer 2015-07-07 2:08 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).