* Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
@ 2014-04-21 16:16 Andreas Reis
2014-04-21 19:13 ` Andreas Reis
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Reis @ 2014-04-21 16:16 UTC (permalink / raw)
To: linux-btrfs
Kernel 3.15.0-rc2, btrfs-progs 3.14.1
While doing some minor package updates my btrfs root partition [*]
decided to corrupt itself. There was no system crash, although I had
plenty of these (due to an USB-related regression) in recent weeks that
resulted in no trouble.
First only one of a package's folders was corrupted, any access to files
within (incl. attempts to delete) printed
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
to dmesg (I'm actually not sure about the numbers, but that was indeed
the error message). After moving the folder out of the way the partition
continued to appear working as normal, one reboot also worked fine.
Now I can't boot at all (beyond loading the kernel image located on
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
mount the __current/ROOT subvolume on ArchLinux's current Live-CD
(kernel 3.13.7) prints
btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
cleanup
BTRFS critical (device sdc5): could not do orphan cleanup -22
Doing "btrfs check /dev/sdc5" merely first prints ten
free space inode generation (0) did not match free space cache
generation ([different transids between 40010 and 55578])
to then abort with
checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
key->objectid || rec->refs > 1)' failed.
I'm reluctant to try any of "btrfs check" options (or mount with -o
recovery) since the last three times I did this (with other partitions)
it resulted in the partition becoming entirely trashed, while before at
least "btrfs restore" still managed to extract some data each time.
The affected folder was one within /usr/include/qt4 (which I then moved
to /usr/BROKEN, to successfully reinstall the package), ie. on the
__current/ROOT subvolume.
Which seems the only subvolume affected (yet). Mounting & accessing the
other three (__current/{var,home,opt}) still works.
[*] Organised following
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
@ 2014-04-21 19:13 ` Andreas Reis
2014-04-21 23:44 ` Duncan
2014-04-22 18:16 ` Andreas Reis
0 siblings, 2 replies; 9+ messages in thread
From: Andreas Reis @ 2014-04-21 19:13 UTC (permalink / raw)
To: linux-btrfs
Alright, turns out the partition does actually mount on 3.15-rc2 (error
messages remain, of course).
But systemd will fail to continue booting as /bin/mount returns "exit
status 32" and / thus ends as ro, yet can be manually remounted as rw.
Another error message I've spotted with 3.15 is
BTRFS error (device sdc5): error loading props for ino 1810424 (root
257): -5
I've now tried to mount with -o recovery and clear_cache, no effect.
On 21.04.2014 18:16, Andreas Reis wrote:
> Kernel 3.15.0-rc2, btrfs-progs 3.14.1
>
> While doing some minor package updates my btrfs root partition [*]
> decided to corrupt itself. There was no system crash, although I had
> plenty of these (due to an USB-related regression) in recent weeks that
> resulted in no trouble.
>
> First only one of a package's folders was corrupted, any access to files
> within (incl. attempts to delete) printed
>
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
>
> to dmesg (I'm actually not sure about the numbers, but that was indeed
> the error message). After moving the folder out of the way the partition
> continued to appear working as normal, one reboot also worked fine.
>
> Now I can't boot at all (beyond loading the kernel image located on
> another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
> mount the __current/ROOT subvolume on ArchLinux's current Live-CD
> (kernel 3.13.7) prints
>
> btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
> btrfs: use ssd allocation scheme
> btrfs: disk space caching is enabled
> btrfs: checking UUID tree
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
> cleanup
> BTRFS critical (device sdc5): could not do orphan cleanup -22
>
> Doing "btrfs check /dev/sdc5" merely first prints ten
>
> free space inode generation (0) did not match free space cache
> generation ([different transids between 40010 and 55578])
>
> to then abort with
>
> checking fs roots
> btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
> key->objectid || rec->refs > 1)' failed.
>
> I'm reluctant to try any of "btrfs check" options (or mount with -o
> recovery) since the last three times I did this (with other partitions)
> it resulted in the partition becoming entirely trashed, while before at
> least "btrfs restore" still managed to extract some data each time.
>
> The affected folder was one within /usr/include/qt4 (which I then moved
> to /usr/BROKEN, to successfully reinstall the package), ie. on the
> __current/ROOT subvolume.
>
> Which seems the only subvolume affected (yet). Mounting & accessing the
> other three (__current/{var,home,opt}) still works.
>
> [*] Organised following
> http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
>
> (Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-21 19:13 ` Andreas Reis
@ 2014-04-21 23:44 ` Duncan
2014-04-22 18:16 ` Andreas Reis
1 sibling, 0 replies; 9+ messages in thread
From: Duncan @ 2014-04-21 23:44 UTC (permalink / raw)
To: linux-btrfs
Andreas Reis posted on Mon, 21 Apr 2014 21:13:16 +0200 as excerpted:
> Alright, turns out the partition does actually mount on 3.15-rc2 (error
> messages remain, of course).
>
> But systemd will fail to continue booting as /bin/mount returns "exit
> status 32" and / thus ends as ro, yet can be manually remounted as rw.
The mount manpage says status 32 is mount failure. Dmesg should contain
more, but that's probably the errors you already mentioned.
So you're getting the read-only mount, but can't remount rw.
(This doesn't apply in your case, but FWIW, I now have my root filesystem
setup to be ro mounted by default, and have been running that way for
some months, now. Seems safer that way. The only time I remount / rw is
when I'm updating the system or changing something in the config, then I
normally remount ro again, altho after updating the system I normally
have to exit and restart X and kde as well as various system services
before I can remount ro, depending on what libraries got changed out from
under my running processes. Of course in ordered to make this work a
few /var/ subdirs that need to be writable are actually symlinks to
/home/var/ subdirs, /var/log is a dedicated writable logging partition of
its own, etc. So a read-only rootfs is the /normal/ case for me, and
wouldn't interfere with normal operations at all. =:^)
> Another error message I've spotted with 3.15 is
>
> BTRFS error (device sdc5): error loading props for ino 1810424 (root
> 257): -5
That would be one of the new btrfs properties introduced in kernel 3.14.
See btrfs property list/get/set... Unless you've set individual file
properties (such as compress), that's probably a property (such as ro/rw)
on a subvolume, or possibly on the main filesystem (label, etc).
Meanwhile, "orphans" normally refer to files that are deleted while
they're still in use. Normally, these will be libraries, etc, replaced
during a system upgrade, but still in use by running programs. Once all
such running programs have been restarted (loading the new version of the
library) or terminated, the filesystem can be unmounted or remounted read-
only. In the event they're not fully cleaned up at umount time, they are
normally cleaned up after reboot, when a filesystem is first mounted
writable once again.
Obviously there's a problem with one of these orphans, and attempts to
clean it up are failing, causing the remount rw to fail.
While that doesn't help with fixing the problem, it should at least give
you some idea of what's going on, and how to interpret the messages and
errors you see.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-21 19:13 ` Andreas Reis
2014-04-21 23:44 ` Duncan
@ 2014-04-22 18:16 ` Andreas Reis
2014-04-23 2:55 ` Duncan
2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
1 sibling, 2 replies; 9+ messages in thread
From: Andreas Reis @ 2014-04-22 18:16 UTC (permalink / raw)
To: linux-btrfs
Same failure with btrfs-progs from integration-20140421 (apart from the
line number 1156).
Can I get a bit of input on this? Is it safe to just ignore the error
for now (as I'm doing atm), ie. remount as rw to skip the orphan cleanup?
Might it even be safe to call btrfs check --repair on the partition? I'm
not keen on that failing mid-process at the same assertion and thus
breaking it over a bunch of minor files, just like it happened with my
previous btrfs partitions.
On 21.04.2014 21:13, Andreas Reis wrote:
> Alright, turns out the partition does actually mount on 3.15-rc2 (error
> messages remain, of course).
>
> But systemd will fail to continue booting as /bin/mount returns "exit
> status 32" and / thus ends as ro, yet can be manually remounted as rw.
>
> Another error message I've spotted with 3.15 is
>
> BTRFS error (device sdc5): error loading props for ino 1810424 (root
> 257): -5
>
> I've now tried to mount with -o recovery and clear_cache, no effect.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-22 18:16 ` Andreas Reis
@ 2014-04-23 2:55 ` Duncan
2014-04-25 2:04 ` Bug: Andreas Reis
2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
1 sibling, 1 reply; 9+ messages in thread
From: Duncan @ 2014-04-23 2:55 UTC (permalink / raw)
To: linux-btrfs
Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:
> Same failure with btrfs-progs from integration-20140421 (apart from the
> line number 1156).
>
> Can I get a bit of input on this? Is it safe to just ignore the error
> for now (as I'm doing atm), ie. remount as rw to skip the orphan
> cleanup?
I explained orphans in my other reply. Since they're simply not yet
completed file deletions, it should be /relatively/ safe to continue
ignoring and doing the manual remount rw, since that continues to work.
"Relatively" as in that's what I'd do in the shorter term here were I
seeing the problem, tho I'd ensure my backups were current and tested, as
should be the case on btrfs anyway since it's not entirely stable yet,
and just because I don't like nagging half-dealt-with-problems left
laying around and the error would eat at me until I'd cleared it, at some
point likely rather sooner than later, I'd very likely mkfs and restore
from those backups. But I'd certainly be willing to continue running
from the partition short term, for a week or so until I had a chance to
do the mkfs.btrfs and restore from backup, as long as that remained the
only issue I was seeing.
> Might it even be safe to call btrfs check --repair on the partition? I'm
> not keen on that failing mid-process at the same assertion and thus
> breaking it over a bunch of minor files, just like it happened with my
> previous btrfs partitions.
That I can't say. Based on reports and the common knowledge of the list,
I've become rather leery of btrfs check --repair myself, and tend to rely
on scrub and balance to fix issues if they can, and beyond that,
mkfs.btrfs and restore from backup. In fact, while btrfs check without
the --repair is safe as it's read-only, I don't run it regularly either,
because I know should it report problems I'd then be worried about things
I might have no reasonable way to fix, that obviously aren't causing me
problems anyway. Basically, if mounting and regular use of the
filesystem isn't giving me anything unusual in dmesg, I consider it good,
and I for the most part I tend to route around btrfs check entirely, as
if it weren't even there, tho I've run it in default read-only mode a few
times, to compare my output with a post from the list or something,
always with a clean bill of health from btrfs check when I have run it.
That said, if you have backups tested and ready anyway, and would
otherwise be doing a mkfs.btrfs in short order in ordered to get rid of
those bad orphan warnings anyway, I don't see the harm in running it,
since at that point it's zero risk anyway. If you lose the filesystem as
a result, big deal, as you were going to mkfs.btrfs and restore from
backup anyway, and if it fixes the problem, well, you saved yourself the
hassle.
Plus, either way you can report back the results and then we'll know
whether it's safe to recommend btrfs check for the next report, or not.
=:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-22 18:16 ` Andreas Reis
2014-04-23 2:55 ` Duncan
@ 2014-04-23 15:02 ` Andreas Reis
1 sibling, 0 replies; 9+ messages in thread
From: Andreas Reis @ 2014-04-23 15:02 UTC (permalink / raw)
To: linux-btrfs
Ah. Thank you for the replies. I didn't get them as mails and spinics
didn't update the thread until yesterday.
So I take it that the recommended course of action is not to wait for
any more or less unlikely btrfs-progs fix, but to try --repair and be
ready to restore from backup, too. Darn, and that over what probably
doesn't amount to more than a few dozen KB. Wish I could simply replace
the single subvolume instead, but I suppose that's one of btrfs's drawbacks.
I did a full partition backup some three weeks ago, so I'll have to
spend some hours to figure out what has changed since then, and how to
do incremental backups of it to different devices for the next time…
I don't have the time atm though; it'll probably take at least a week
(unless the partition decides to die) to report back.
As a side note, there was an ostensibly similar issue fixed in 2012:
https://bugzilla.novell.com/show_bug.cgi?id=760279 Guess that was a
different underlying issue, though.
Duncan posted on Wed, 23 Apr 2014 02:55:36 +0000:
> Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:
>
> > Same failure with btrfs-progs from integration-20140421 (apart from
> > the line number 1156).
> >
> > Can I get a bit of input on this? Is it safe to just ignore the
> > error for now (as I'm doing atm), ie. remount as rw to skip the
> > orphan cleanup?
>
> I explained orphans in my other reply. Since they're simply not yet
> completed file deletions, it should be /relatively/ safe to continue
> ignoring and doing the manual remount rw, since that continues to
> kwork.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug:
2014-04-23 2:55 ` Duncan
@ 2014-04-25 2:04 ` Andreas Reis
2014-04-25 2:43 ` Bug: Partition borked Andreas Reis
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Reis @ 2014-04-25 2:04 UTC (permalink / raw)
To: linux-btrfs
Duncan <1i5t5.duncan <at> cox.net> writes:
> Plus, either way you can report back the results and then we'll
know
> whether it's safe to recommend btrfs check for the next report,
or not.
> =:^)
Well this is just bloody brilliant.
I did btrfs check --repair with from integration and a bunch of
fixes on this list applied. Failed at the same assert, but
otherwise left the partition unchanged, ie. mountable.
So as planned, thinking I have a relatively fresh backup of the
whole partition (via partclone.btrfs), I go on restoring it to
get rid of the errors.
partclone does its thing, the restored partition mounts, text
files are properly readable (!) and btrfs check reports no
errors.
Then on reboot, the kernel (residing on another partition)
instantly crashes: "Input/Output error".
Turns out that when I try to run any binary from the restored
partition (via LiveCD), *every* *single* *one* fails with this
remarkably expressive error. If I manually replace one with a
fresh download, I get a SIGBUS crash instead.
Oh, and upon accessing any of said binaries, dmesg prints a BTRFS
info that csum failed. But only for binaries.
Yay. No idea how to proceed from here, but I guess this might not
necessarily be related to btrfs. Certainly doesn't make me want
to recommend it in the foreseeable future, though.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: Partition borked
2014-04-25 2:04 ` Bug: Andreas Reis
@ 2014-04-25 2:43 ` Andreas Reis
2014-04-25 3:03 ` Chris Murphy
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Reis @ 2014-04-25 2:43 UTC (permalink / raw)
To: linux-btrfs
Andreas Reis <andreas.reis <at> gmail.com> writes:
> Turns out that when I try to run any binary from the restored
> partition (via LiveCD), *every* *single* *one* fails with this
> remarkably expressive error. If I manually replace one with a
> fresh download, I get a SIGBUS crash instead.
Alright, there are corrupt text files too, after all. As well as a
handful or non-corrupted binaries.
Always the same type of btrfs error message though. Interestingly,
the false csum reported stays exactly the same: 2566472073. Also,
btrfs check --init-csum-tree fails with a plethora of backref
errors.
Guess it doesn't matter whether it's the backup or the LiveCD's
kernel (3.13.7) that's at fault, I'm going to have to reinstall
either way.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug: Partition borked
2014-04-25 2:43 ` Bug: Partition borked Andreas Reis
@ 2014-04-25 3:03 ` Chris Murphy
0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2014-04-25 3:03 UTC (permalink / raw)
To: Andreas Reis; +Cc: linux-btrfs
On Apr 24, 2014, at 8:43 PM, Andreas Reis <andreas.reis@gmail.com> wrote:
> Andreas Reis <andreas.reis <at> gmail.com> writes:
>
>> Turns out that when I try to run any binary from the restored
>> partition (via LiveCD), *every* *single* *one* fails with this
>> remarkably expressive error. If I manually replace one with a
>> fresh download, I get a SIGBUS crash instead.
>
> Alright, there are corrupt text files too, after all. As well as a
> handful or non-corrupted binaries.
>
> Always the same type of btrfs error message though. Interestingly,
> the false csum reported stays exactly the same: 2566472073. Also,
> btrfs check --init-csum-tree fails with a plethora of backref
> errors.
That command obliterates the csum tree. csums are not recomputed for already written files. Anytime you read an existing file, e.g. merely copy it, you'll get a long pile of csum errors because there's missing csums.
btrfs check itself is benign, but the options --init* and --repair have been fairly vertical fixes for specific problems and can make others worse; although that experience is largely based on older progs. I'm not sure yet how well 3.14 is repairing, and haven't looked at the changelog to see if btrfsck has been significantly updated in it.
Chris Murphy
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-04-25 3:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
2014-04-21 19:13 ` Andreas Reis
2014-04-21 23:44 ` Duncan
2014-04-22 18:16 ` Andreas Reis
2014-04-23 2:55 ` Duncan
2014-04-25 2:04 ` Bug: Andreas Reis
2014-04-25 2:43 ` Bug: Partition borked Andreas Reis
2014-04-25 3:03 ` Chris Murphy
2014-04-23 15:02 ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).