* Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
@ 2014-04-21 16:16 Andreas Reis
2014-04-21 19:13 ` Andreas Reis
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Reis @ 2014-04-21 16:16 UTC (permalink / raw)
To: linux-btrfs
Kernel 3.15.0-rc2, btrfs-progs 3.14.1
While doing some minor package updates my btrfs root partition [*]
decided to corrupt itself. There was no system crash, although I had
plenty of these (due to an USB-related regression) in recent weeks that
resulted in no trouble.
First only one of a package's folders was corrupted, any access to files
within (incl. attempts to delete) printed
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
to dmesg (I'm actually not sure about the numbers, but that was indeed
the error message). After moving the folder out of the way the partition
continued to appear working as normal, one reboot also worked fine.
Now I can't boot at all (beyond loading the kernel image located on
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
mount the __current/ROOT subvolume on ArchLinux's current Live-CD
(kernel 3.13.7) prints
btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
cleanup
BTRFS critical (device sdc5): could not do orphan cleanup -22
Doing "btrfs check /dev/sdc5" merely first prints ten
free space inode generation (0) did not match free space cache
generation ([different transids between 40010 and 55578])
to then abort with
checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
key->objectid || rec->refs > 1)' failed.
I'm reluctant to try any of "btrfs check" options (or mount with -o
recovery) since the last three times I did this (with other partitions)
it resulted in the partition becoming entirely trashed, while before at
least "btrfs restore" still managed to extract some data each time.
The affected folder was one within /usr/include/qt4 (which I then moved
to /usr/BROKEN, to successfully reinstall the package), ie. on the
__current/ROOT subvolume.
Which seems the only subvolume affected (yet). Mounting & accessing the
other three (__current/{var,home,opt}) still works.
[*] Organised following
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes 2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis @ 2014-04-21 19:13 ` Andreas Reis 2014-04-21 23:44 ` Duncan 2014-04-22 18:16 ` Andreas Reis 0 siblings, 2 replies; 9+ messages in thread From: Andreas Reis @ 2014-04-21 19:13 UTC (permalink / raw) To: linux-btrfs Alright, turns out the partition does actually mount on 3.15-rc2 (error messages remain, of course). But systemd will fail to continue booting as /bin/mount returns "exit status 32" and / thus ends as ro, yet can be manually remounted as rw. Another error message I've spotted with 3.15 is BTRFS error (device sdc5): error loading props for ino 1810424 (root 257): -5 I've now tried to mount with -o recovery and clear_cache, no effect. On 21.04.2014 18:16, Andreas Reis wrote: > Kernel 3.15.0-rc2, btrfs-progs 3.14.1 > > While doing some minor package updates my btrfs root partition [*] > decided to corrupt itself. There was no system crash, although I had > plenty of these (due to an USB-related regression) in recent weeks that > resulted in no trouble. > > First only one of a package's folders was corrupted, any access to files > within (incl. attempts to delete) printed > > btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88 > > to dmesg (I'm actually not sure about the numbers, but that was indeed > the error message). After moving the folder out of the way the partition > continued to appear working as normal, one reboot also worked fine. > > Now I can't boot at all (beyond loading the kernel image located on > another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to > mount the __current/ROOT subvolume on ArchLinux's current Live-CD > (kernel 3.13.7) prints > > btrfs: device label Linux devid 1 transid 55586 /dev/sdc5 > btrfs: use ssd allocation scheme > btrfs: disk space caching is enabled > btrfs: checking UUID tree > btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88 > btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88 > BTRFS error (device sdc5): Error removing orphan entry, stopping orphan > cleanup > BTRFS critical (device sdc5): could not do orphan cleanup -22 > > Doing "btrfs check /dev/sdc5" merely first prints ten > > free space inode generation (0) did not match free space cache > generation ([different transids between 40010 and 55578]) > > to then abort with > > checking fs roots > btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino != > key->objectid || rec->refs > 1)' failed. > > I'm reluctant to try any of "btrfs check" options (or mount with -o > recovery) since the last three times I did this (with other partitions) > it resulted in the partition becoming entirely trashed, while before at > least "btrfs restore" still managed to extract some data each time. > > The affected folder was one within /usr/include/qt4 (which I then moved > to /usr/BROKEN, to successfully reinstall the package), ie. on the > __current/ROOT subvolume. > > Which seems the only subvolume affected (yet). Mounting & accessing the > other three (__current/{var,home,opt}) still works. > > [*] Organised following > http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html > > (Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 ) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes 2014-04-21 19:13 ` Andreas Reis @ 2014-04-21 23:44 ` Duncan 2014-04-22 18:16 ` Andreas Reis 1 sibling, 0 replies; 9+ messages in thread From: Duncan @ 2014-04-21 23:44 UTC (permalink / raw) To: linux-btrfs Andreas Reis posted on Mon, 21 Apr 2014 21:13:16 +0200 as excerpted: > Alright, turns out the partition does actually mount on 3.15-rc2 (error > messages remain, of course). > > But systemd will fail to continue booting as /bin/mount returns "exit > status 32" and / thus ends as ro, yet can be manually remounted as rw. The mount manpage says status 32 is mount failure. Dmesg should contain more, but that's probably the errors you already mentioned. So you're getting the read-only mount, but can't remount rw. (This doesn't apply in your case, but FWIW, I now have my root filesystem setup to be ro mounted by default, and have been running that way for some months, now. Seems safer that way. The only time I remount / rw is when I'm updating the system or changing something in the config, then I normally remount ro again, altho after updating the system I normally have to exit and restart X and kde as well as various system services before I can remount ro, depending on what libraries got changed out from under my running processes. Of course in ordered to make this work a few /var/ subdirs that need to be writable are actually symlinks to /home/var/ subdirs, /var/log is a dedicated writable logging partition of its own, etc. So a read-only rootfs is the /normal/ case for me, and wouldn't interfere with normal operations at all. =:^) > Another error message I've spotted with 3.15 is > > BTRFS error (device sdc5): error loading props for ino 1810424 (root > 257): -5 That would be one of the new btrfs properties introduced in kernel 3.14. See btrfs property list/get/set... Unless you've set individual file properties (such as compress), that's probably a property (such as ro/rw) on a subvolume, or possibly on the main filesystem (label, etc). Meanwhile, "orphans" normally refer to files that are deleted while they're still in use. Normally, these will be libraries, etc, replaced during a system upgrade, but still in use by running programs. Once all such running programs have been restarted (loading the new version of the library) or terminated, the filesystem can be unmounted or remounted read- only. In the event they're not fully cleaned up at umount time, they are normally cleaned up after reboot, when a filesystem is first mounted writable once again. Obviously there's a problem with one of these orphans, and attempts to clean it up are failing, causing the remount rw to fail. While that doesn't help with fixing the problem, it should at least give you some idea of what's going on, and how to interpret the messages and errors you see. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes 2014-04-21 19:13 ` Andreas Reis 2014-04-21 23:44 ` Duncan @ 2014-04-22 18:16 ` Andreas Reis 2014-04-23 2:55 ` Duncan 2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis 1 sibling, 2 replies; 9+ messages in thread From: Andreas Reis @ 2014-04-22 18:16 UTC (permalink / raw) To: linux-btrfs Same failure with btrfs-progs from integration-20140421 (apart from the line number 1156). Can I get a bit of input on this? Is it safe to just ignore the error for now (as I'm doing atm), ie. remount as rw to skip the orphan cleanup? Might it even be safe to call btrfs check --repair on the partition? I'm not keen on that failing mid-process at the same assertion and thus breaking it over a bunch of minor files, just like it happened with my previous btrfs partitions. On 21.04.2014 21:13, Andreas Reis wrote: > Alright, turns out the partition does actually mount on 3.15-rc2 (error > messages remain, of course). > > But systemd will fail to continue booting as /bin/mount returns "exit > status 32" and / thus ends as ro, yet can be manually remounted as rw. > > Another error message I've spotted with 3.15 is > > BTRFS error (device sdc5): error loading props for ino 1810424 (root > 257): -5 > > I've now tried to mount with -o recovery and clear_cache, no effect. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes 2014-04-22 18:16 ` Andreas Reis @ 2014-04-23 2:55 ` Duncan 2014-04-25 2:04 ` Bug: Andreas Reis 2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis 1 sibling, 1 reply; 9+ messages in thread From: Duncan @ 2014-04-23 2:55 UTC (permalink / raw) To: linux-btrfs Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted: > Same failure with btrfs-progs from integration-20140421 (apart from the > line number 1156). > > Can I get a bit of input on this? Is it safe to just ignore the error > for now (as I'm doing atm), ie. remount as rw to skip the orphan > cleanup? I explained orphans in my other reply. Since they're simply not yet completed file deletions, it should be /relatively/ safe to continue ignoring and doing the manual remount rw, since that continues to work. "Relatively" as in that's what I'd do in the shorter term here were I seeing the problem, tho I'd ensure my backups were current and tested, as should be the case on btrfs anyway since it's not entirely stable yet, and just because I don't like nagging half-dealt-with-problems left laying around and the error would eat at me until I'd cleared it, at some point likely rather sooner than later, I'd very likely mkfs and restore from those backups. But I'd certainly be willing to continue running from the partition short term, for a week or so until I had a chance to do the mkfs.btrfs and restore from backup, as long as that remained the only issue I was seeing. > Might it even be safe to call btrfs check --repair on the partition? I'm > not keen on that failing mid-process at the same assertion and thus > breaking it over a bunch of minor files, just like it happened with my > previous btrfs partitions. That I can't say. Based on reports and the common knowledge of the list, I've become rather leery of btrfs check --repair myself, and tend to rely on scrub and balance to fix issues if they can, and beyond that, mkfs.btrfs and restore from backup. In fact, while btrfs check without the --repair is safe as it's read-only, I don't run it regularly either, because I know should it report problems I'd then be worried about things I might have no reasonable way to fix, that obviously aren't causing me problems anyway. Basically, if mounting and regular use of the filesystem isn't giving me anything unusual in dmesg, I consider it good, and I for the most part I tend to route around btrfs check entirely, as if it weren't even there, tho I've run it in default read-only mode a few times, to compare my output with a post from the list or something, always with a clean bill of health from btrfs check when I have run it. That said, if you have backups tested and ready anyway, and would otherwise be doing a mkfs.btrfs in short order in ordered to get rid of those bad orphan warnings anyway, I don't see the harm in running it, since at that point it's zero risk anyway. If you lose the filesystem as a result, big deal, as you were going to mkfs.btrfs and restore from backup anyway, and if it fixes the problem, well, you saved yourself the hassle. Plus, either way you can report back the results and then we'll know whether it's safe to recommend btrfs check for the next report, or not. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: 2014-04-23 2:55 ` Duncan @ 2014-04-25 2:04 ` Andreas Reis 2014-04-25 2:43 ` Bug: Partition borked Andreas Reis 0 siblings, 1 reply; 9+ messages in thread From: Andreas Reis @ 2014-04-25 2:04 UTC (permalink / raw) To: linux-btrfs Duncan <1i5t5.duncan <at> cox.net> writes: > Plus, either way you can report back the results and then we'll know > whether it's safe to recommend btrfs check for the next report, or not. > =:^) Well this is just bloody brilliant. I did btrfs check --repair with from integration and a bunch of fixes on this list applied. Failed at the same assert, but otherwise left the partition unchanged, ie. mountable. So as planned, thinking I have a relatively fresh backup of the whole partition (via partclone.btrfs), I go on restoring it to get rid of the errors. partclone does its thing, the restored partition mounts, text files are properly readable (!) and btrfs check reports no errors. Then on reboot, the kernel (residing on another partition) instantly crashes: "Input/Output error". Turns out that when I try to run any binary from the restored partition (via LiveCD), *every* *single* *one* fails with this remarkably expressive error. If I manually replace one with a fresh download, I get a SIGBUS crash instead. Oh, and upon accessing any of said binaries, dmesg prints a BTRFS info that csum failed. But only for binaries. Yay. No idea how to proceed from here, but I guess this might not necessarily be related to btrfs. Certainly doesn't make me want to recommend it in the foreseeable future, though. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: Partition borked 2014-04-25 2:04 ` Bug: Andreas Reis @ 2014-04-25 2:43 ` Andreas Reis 2014-04-25 3:03 ` Chris Murphy 0 siblings, 1 reply; 9+ messages in thread From: Andreas Reis @ 2014-04-25 2:43 UTC (permalink / raw) To: linux-btrfs Andreas Reis <andreas.reis <at> gmail.com> writes: > Turns out that when I try to run any binary from the restored > partition (via LiveCD), *every* *single* *one* fails with this > remarkably expressive error. If I manually replace one with a > fresh download, I get a SIGBUS crash instead. Alright, there are corrupt text files too, after all. As well as a handful or non-corrupted binaries. Always the same type of btrfs error message though. Interestingly, the false csum reported stays exactly the same: 2566472073. Also, btrfs check --init-csum-tree fails with a plethora of backref errors. Guess it doesn't matter whether it's the backup or the LiveCD's kernel (3.13.7) that's at fault, I'm going to have to reinstall either way. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: Partition borked 2014-04-25 2:43 ` Bug: Partition borked Andreas Reis @ 2014-04-25 3:03 ` Chris Murphy 0 siblings, 0 replies; 9+ messages in thread From: Chris Murphy @ 2014-04-25 3:03 UTC (permalink / raw) To: Andreas Reis; +Cc: linux-btrfs On Apr 24, 2014, at 8:43 PM, Andreas Reis <andreas.reis@gmail.com> wrote: > Andreas Reis <andreas.reis <at> gmail.com> writes: > >> Turns out that when I try to run any binary from the restored >> partition (via LiveCD), *every* *single* *one* fails with this >> remarkably expressive error. If I manually replace one with a >> fresh download, I get a SIGBUS crash instead. > > Alright, there are corrupt text files too, after all. As well as a > handful or non-corrupted binaries. > > Always the same type of btrfs error message though. Interestingly, > the false csum reported stays exactly the same: 2566472073. Also, > btrfs check --init-csum-tree fails with a plethora of backref > errors. That command obliterates the csum tree. csums are not recomputed for already written files. Anytime you read an existing file, e.g. merely copy it, you'll get a long pile of csum errors because there's missing csums. btrfs check itself is benign, but the options --init* and --repair have been fairly vertical fixes for specific problems and can make others worse; although that experience is largely based on older progs. I'm not sure yet how well 3.14 is repairing, and haven't looked at the changelog to see if btrfsck has been significantly updated in it. Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes 2014-04-22 18:16 ` Andreas Reis 2014-04-23 2:55 ` Duncan @ 2014-04-23 15:02 ` Andreas Reis 1 sibling, 0 replies; 9+ messages in thread From: Andreas Reis @ 2014-04-23 15:02 UTC (permalink / raw) To: linux-btrfs Ah. Thank you for the replies. I didn't get them as mails and spinics didn't update the thread until yesterday. So I take it that the recommended course of action is not to wait for any more or less unlikely btrfs-progs fix, but to try --repair and be ready to restore from backup, too. Darn, and that over what probably doesn't amount to more than a few dozen KB. Wish I could simply replace the single subvolume instead, but I suppose that's one of btrfs's drawbacks. I did a full partition backup some three weeks ago, so I'll have to spend some hours to figure out what has changed since then, and how to do incremental backups of it to different devices for the next time… I don't have the time atm though; it'll probably take at least a week (unless the partition decides to die) to report back. As a side note, there was an ostensibly similar issue fixed in 2012: https://bugzilla.novell.com/show_bug.cgi?id=760279 Guess that was a different underlying issue, though. Duncan posted on Wed, 23 Apr 2014 02:55:36 +0000: > Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted: > > > Same failure with btrfs-progs from integration-20140421 (apart from > > the line number 1156). > > > > Can I get a bit of input on this? Is it safe to just ignore the > > error for now (as I'm doing atm), ie. remount as rw to skip the > > orphan cleanup? > > I explained orphans in my other reply. Since they're simply not yet > completed file deletions, it should be /relatively/ safe to continue > ignoring and doing the manual remount rw, since that continues to > kwork. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-04-25 3:03 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis 2014-04-21 19:13 ` Andreas Reis 2014-04-21 23:44 ` Duncan 2014-04-22 18:16 ` Andreas Reis 2014-04-23 2:55 ` Duncan 2014-04-25 2:04 ` Bug: Andreas Reis 2014-04-25 2:43 ` Bug: Partition borked Andreas Reis 2014-04-25 3:03 ` Chris Murphy 2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).