* bug?
@ 2012-04-21 12:53 Thomas Weber
2012-04-24 15:26 ` bug? Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Weber @ 2012-04-21 12:53 UTC (permalink / raw)
To: linux-btrfs
Hello,
today my laptop crashed with the following output. Installed is
Archlinux with btrfs on a SSD.
Is it btrfs related?
Thanks,
Thomas
Apr 21 13:01:01 localhost anacron[3307]: Anacron started on 2012-04-21
Apr 21 13:01:01 localhost anacron[3307]: Will run job `cron.daily' in 48
min.
Apr 21 13:01:01 localhost anacron[3307]: Jobs will be executed sequentially
Apr 21 13:21:01 localhost -- MARK --
Apr 21 13:41:01 localhost -- MARK --
Apr 21 13:49:01 localhost anacron[3307]: Job `cron.daily' started
Apr 21 13:49:14 localhost kernel: [23420.297861] general protection
fault: 0000 [#1] PREEMPT SMP
Apr 21 13:49:14 localhost kernel: [23420.297976] CPU 1
Apr 21 13:49:14 localhost kernel: [23420.298007] Modules linked in:
nls_cp437 vfat fat usb_storage uas aes_x86_64 cryptd aes_generic fuse
ext4 jbd2 mbcache crc16 joydev arc4 dell_wmi sparse_keymap i915
snd_hda_codec_idt iwl3945 dell_laptop dcdbas iwl_legacy mac80211
snd_hda_intel i2c_algo_bit snd_hda_codec drm_kms_helper evdev snd_hwdep
snd_pcm drm serio_raw psmouse pcspkr tg3 snd_page_alloc cfg80211
snd_timer i2c_i801 iTCO_wdt iTCO_vendor_support snd i2c_core libphy
rfkill soundcore intel_agp intel_gtt wmi thermal button battery video
processor ac btrfs crc32c libcrc32c zlib_deflate sr_mod cdrom sd_mod
pata_acpi ata_generic ata_piix libata scsi_mod ehci_hcd uhci_hcd usbcore
usb_common
Apr 21 13:49:14 localhost kernel: [23420.299233]
Apr 21 13:49:14 localhost kernel: [23420.299262] Pid: 11172, comm:
updatedb Not tainted 3.2.11-1-ARCH #1 Dell Inc. Latitude
D530 /0HP728
Apr 21 13:49:14 localhost kernel: [23420.299410] RIP:
0010:[<ffffffffa0180ddd>] [<ffffffffa0180ddd>] btrfs_getattr+0x3d/0x90
[btrfs]
Apr 21 13:49:14 localhost kernel: [23420.299560] RSP:
0018:ffff880077765e38 EFLAGS: 00010206
Apr 21 13:49:14 localhost kernel: [23420.299630] RAX: 41d700000000fffe
RBX: ffff8800bf6c1550 RCX: 000000000000000c
Apr 21 13:49:14 localhost kernel: [23420.299717] RDX: ffff880077765f00
RSI: ffff880077765f00 RDI: ffff8800bf6c1550
Apr 21 13:49:14 localhost kernel: [23420.299804] RBP: ffff880077765e58
R08: ffffffff81173373 R09: ffff8800ba43bcf8
Apr 21 13:49:14 localhost kernel: [23420.299891] R10: ffff8800ba43bcc0
R11: 0000000000000005 R12: ffff880077765f00
Apr 21 13:49:14 localhost kernel: [23420.299978] R13: 0000000000001000
R14: ffff880077765f00 R15: 0000000001e54c50
Apr 21 13:49:14 localhost kernel: [23420.300067] FS:
00007f3e42b8f700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300168] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Apr 21 13:49:14 localhost kernel: [23420.300240] CR2: 0000000001e71ffc
CR3: 0000000058685000 CR4: 00000000000006e0
Apr 21 13:49:14 localhost kernel: [23420.300327] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300414] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 21 13:49:14 localhost kernel: [23420.300502] Process updatedb (pid:
11172, threadinfo ffff880077764000, task ffff88008a8023a0)
Apr 21 13:49:14 localhost kernel: [23420.300603] Stack:
Apr 21 13:49:14 localhost kernel: [23420.300635] ffff880077765f68
ffff8800ba43bcc0 ffff88011707bd00 ffff8800bf6c1550
Apr 21 13:49:14 localhost kernel: [23420.300753] ffff880077765e98
ffffffff8116c51e ffff880077765f00 0000000001e38319
Apr 21 13:49:14 localhost kernel: [23420.300869] ffff880077765f00
0000000001e38319 00007fffd821ca18 0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300985] Call Trace:
Apr 21 13:49:14 localhost kernel: [23420.301001] [<ffffffff8116c51e>]
vfs_getattr+0x4e/0x80
Apr 21 13:49:14 localhost kernel: [23420.301001] [<ffffffff8116c59e>]
vfs_fstatat+0x4e/0x70
Apr 21 13:49:14 localhost kernel: [23420.301001] [<ffffffff8116c5de>]
vfs_lstat+0x1e/0x20
Apr 21 13:49:14 localhost kernel: [23420.301001] [<ffffffff8116c77a>]
sys_newlstat+0x1a/0x40
Apr 21 13:49:14 localhost kernel: [23420.301001] [<ffffffff8145ddc2>]
system_call_fastpath+0x16/0x1b
Apr 21 13:49:14 localhost kernel: [23420.301001] Code: 6d f8 66 66 66 66
90 48 8b 5e 30 48 89 d6 49 89 d4 48 8b 43 28 48 89 df 44 8b 68 18 e8 5d
b2 fe e0 48 8b 83 60 fe ff ff 48 89 df <8b> 80 00 04 00 00 49 c7 44 24
58 00 10 00 00 41 89 44 24 08 e8
Apr 21 13:49:14 localhost kernel: [23420.301001] RIP
[<ffffffffa0180ddd>] btrfs_getattr+0x3d/0x90 [btrfs]
Apr 21 13:49:14 localhost kernel: [23420.301001] RSP <ffff880077765e38>
Apr 21 13:49:14 localhost anacron[3307]: Job `cron.daily' terminated
(exit status: 1) (mailing output)
Apr 21 13:49:14 localhost anacron[3307]: Can't find sendmail at
/usr/sbin/sendmail, not mailing output
Apr 21 13:49:14 localhost anacron[3307]: Normal exit (1 job run)
Apr 21 13:49:14 localhost kernel: [23420.365362] ---[ end trace
27dae2a049083cf1 ]---
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bug?
2012-04-21 12:53 bug? Thomas Weber
@ 2012-04-24 15:26 ` Josef Bacik
2012-04-24 15:47 ` bug? Thomas Weber
0 siblings, 1 reply; 4+ messages in thread
From: Josef Bacik @ 2012-04-24 15:26 UTC (permalink / raw)
To: Thomas Weber; +Cc: linux-btrfs
On Sat, Apr 21, 2012 at 02:53:55PM +0200, Thomas Weber wrote:
> Hello,
>
> today my laptop crashed with the following output. Installed is
> Archlinux with btrfs on a SSD.
> Is it btrfs related?
Sort of an old kernel, can you try on something recent? It doesn't look
familiar but who knows. Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bug?
2012-04-24 15:26 ` bug? Josef Bacik
@ 2012-04-24 15:47 ` Thomas Weber
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Weber @ 2012-04-24 15:47 UTC (permalink / raw)
To: Josef Bacik; +Cc: Thomas Weber, linux-btrfs
Hello Josef,
On 04/24/2012 05:26 PM, Josef Bacik wrote:
> On Sat, Apr 21, 2012 at 02:53:55PM +0200, Thomas Weber wrote:
>> Hello,
>>
>> today my laptop crashed with the following output. Installed is
>> Archlinux with btrfs on a SSD.
>> Is it btrfs related?
> Sort of an old kernel, can you try on something recent? It doesn't look
> familiar but who knows. Thanks,
>
> Josef
I was on the 3.2 kernel because of the enospc problem. Today I updated
to 3.3.3 kernel.
Thomas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
@ 2014-04-21 16:16 Andreas Reis
2014-04-21 19:13 ` Andreas Reis
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Reis @ 2014-04-21 16:16 UTC (permalink / raw)
To: linux-btrfs
Kernel 3.15.0-rc2, btrfs-progs 3.14.1
While doing some minor package updates my btrfs root partition [*]
decided to corrupt itself. There was no system crash, although I had
plenty of these (due to an USB-related regression) in recent weeks that
resulted in no trouble.
First only one of a package's folders was corrupted, any access to files
within (incl. attempts to delete) printed
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
to dmesg (I'm actually not sure about the numbers, but that was indeed
the error message). After moving the folder out of the way the partition
continued to appear working as normal, one reboot also worked fine.
Now I can't boot at all (beyond loading the kernel image located on
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
mount the __current/ROOT subvolume on ArchLinux's current Live-CD
(kernel 3.13.7) prints
btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
cleanup
BTRFS critical (device sdc5): could not do orphan cleanup -22
Doing "btrfs check /dev/sdc5" merely first prints ten
free space inode generation (0) did not match free space cache
generation ([different transids between 40010 and 55578])
to then abort with
checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
key->objectid || rec->refs > 1)' failed.
I'm reluctant to try any of "btrfs check" options (or mount with -o
recovery) since the last three times I did this (with other partitions)
it resulted in the partition becoming entirely trashed, while before at
least "btrfs restore" still managed to extract some data each time.
The affected folder was one within /usr/include/qt4 (which I then moved
to /usr/BROKEN, to successfully reinstall the package), ie. on the
__current/ROOT subvolume.
Which seems the only subvolume affected (yet). Mounting & accessing the
other three (__current/{var,home,opt}) still works.
[*] Organised following
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
@ 2014-04-21 19:13 ` Andreas Reis
2014-04-22 18:16 ` Andreas Reis
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Reis @ 2014-04-21 19:13 UTC (permalink / raw)
To: linux-btrfs
Alright, turns out the partition does actually mount on 3.15-rc2 (error
messages remain, of course).
But systemd will fail to continue booting as /bin/mount returns "exit
status 32" and / thus ends as ro, yet can be manually remounted as rw.
Another error message I've spotted with 3.15 is
BTRFS error (device sdc5): error loading props for ino 1810424 (root
257): -5
I've now tried to mount with -o recovery and clear_cache, no effect.
On 21.04.2014 18:16, Andreas Reis wrote:
> Kernel 3.15.0-rc2, btrfs-progs 3.14.1
>
> While doing some minor package updates my btrfs root partition [*]
> decided to corrupt itself. There was no system crash, although I had
> plenty of these (due to an USB-related regression) in recent weeks that
> resulted in no trouble.
>
> First only one of a package's folders was corrupted, any access to files
> within (incl. attempts to delete) printed
>
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
>
> to dmesg (I'm actually not sure about the numbers, but that was indeed
> the error message). After moving the folder out of the way the partition
> continued to appear working as normal, one reboot also worked fine.
>
> Now I can't boot at all (beyond loading the kernel image located on
> another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
> mount the __current/ROOT subvolume on ArchLinux's current Live-CD
> (kernel 3.13.7) prints
>
> btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
> btrfs: use ssd allocation scheme
> btrfs: disk space caching is enabled
> btrfs: checking UUID tree
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
> cleanup
> BTRFS critical (device sdc5): could not do orphan cleanup -22
>
> Doing "btrfs check /dev/sdc5" merely first prints ten
>
> free space inode generation (0) did not match free space cache
> generation ([different transids between 40010 and 55578])
>
> to then abort with
>
> checking fs roots
> btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
> key->objectid || rec->refs > 1)' failed.
>
> I'm reluctant to try any of "btrfs check" options (or mount with -o
> recovery) since the last three times I did this (with other partitions)
> it resulted in the partition becoming entirely trashed, while before at
> least "btrfs restore" still managed to extract some data each time.
>
> The affected folder was one within /usr/include/qt4 (which I then moved
> to /usr/BROKEN, to successfully reinstall the package), ie. on the
> __current/ROOT subvolume.
>
> Which seems the only subvolume affected (yet). Mounting & accessing the
> other three (__current/{var,home,opt}) still works.
>
> [*] Organised following
> http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
>
> (Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-21 19:13 ` Andreas Reis
@ 2014-04-22 18:16 ` Andreas Reis
2014-04-23 2:55 ` Duncan
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Reis @ 2014-04-22 18:16 UTC (permalink / raw)
To: linux-btrfs
Same failure with btrfs-progs from integration-20140421 (apart from the
line number 1156).
Can I get a bit of input on this? Is it safe to just ignore the error
for now (as I'm doing atm), ie. remount as rw to skip the orphan cleanup?
Might it even be safe to call btrfs check --repair on the partition? I'm
not keen on that failing mid-process at the same assertion and thus
breaking it over a bunch of minor files, just like it happened with my
previous btrfs partitions.
On 21.04.2014 21:13, Andreas Reis wrote:
> Alright, turns out the partition does actually mount on 3.15-rc2 (error
> messages remain, of course).
>
> But systemd will fail to continue booting as /bin/mount returns "exit
> status 32" and / thus ends as ro, yet can be manually remounted as rw.
>
> Another error message I've spotted with 3.15 is
>
> BTRFS error (device sdc5): error loading props for ino 1810424 (root
> 257): -5
>
> I've now tried to mount with -o recovery and clear_cache, no effect.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
2014-04-22 18:16 ` Andreas Reis
@ 2014-04-23 2:55 ` Duncan
2014-04-25 2:04 ` Bug: Andreas Reis
0 siblings, 1 reply; 4+ messages in thread
From: Duncan @ 2014-04-23 2:55 UTC (permalink / raw)
To: linux-btrfs
Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:
> Same failure with btrfs-progs from integration-20140421 (apart from the
> line number 1156).
>
> Can I get a bit of input on this? Is it safe to just ignore the error
> for now (as I'm doing atm), ie. remount as rw to skip the orphan
> cleanup?
I explained orphans in my other reply. Since they're simply not yet
completed file deletions, it should be /relatively/ safe to continue
ignoring and doing the manual remount rw, since that continues to work.
"Relatively" as in that's what I'd do in the shorter term here were I
seeing the problem, tho I'd ensure my backups were current and tested, as
should be the case on btrfs anyway since it's not entirely stable yet,
and just because I don't like nagging half-dealt-with-problems left
laying around and the error would eat at me until I'd cleared it, at some
point likely rather sooner than later, I'd very likely mkfs and restore
from those backups. But I'd certainly be willing to continue running
from the partition short term, for a week or so until I had a chance to
do the mkfs.btrfs and restore from backup, as long as that remained the
only issue I was seeing.
> Might it even be safe to call btrfs check --repair on the partition? I'm
> not keen on that failing mid-process at the same assertion and thus
> breaking it over a bunch of minor files, just like it happened with my
> previous btrfs partitions.
That I can't say. Based on reports and the common knowledge of the list,
I've become rather leery of btrfs check --repair myself, and tend to rely
on scrub and balance to fix issues if they can, and beyond that,
mkfs.btrfs and restore from backup. In fact, while btrfs check without
the --repair is safe as it's read-only, I don't run it regularly either,
because I know should it report problems I'd then be worried about things
I might have no reasonable way to fix, that obviously aren't causing me
problems anyway. Basically, if mounting and regular use of the
filesystem isn't giving me anything unusual in dmesg, I consider it good,
and I for the most part I tend to route around btrfs check entirely, as
if it weren't even there, tho I've run it in default read-only mode a few
times, to compare my output with a post from the list or something,
always with a clean bill of health from btrfs check when I have run it.
That said, if you have backups tested and ready anyway, and would
otherwise be doing a mkfs.btrfs in short order in ordered to get rid of
those bad orphan warnings anyway, I don't see the harm in running it,
since at that point it's zero risk anyway. If you lose the filesystem as
a result, big deal, as you were going to mkfs.btrfs and restore from
backup anyway, and if it fixes the problem, well, you saved yourself the
hassle.
Plus, either way you can report back the results and then we'll know
whether it's safe to recommend btrfs check for the next report, or not.
=:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug:
2014-04-23 2:55 ` Duncan
@ 2014-04-25 2:04 ` Andreas Reis
0 siblings, 0 replies; 4+ messages in thread
From: Andreas Reis @ 2014-04-25 2:04 UTC (permalink / raw)
To: linux-btrfs
Duncan <1i5t5.duncan <at> cox.net> writes:
> Plus, either way you can report back the results and then we'll
know
> whether it's safe to recommend btrfs check for the next report,
or not.
> =:^)
Well this is just bloody brilliant.
I did btrfs check --repair with from integration and a bunch of
fixes on this list applied. Failed at the same assert, but
otherwise left the partition unchanged, ie. mountable.
So as planned, thinking I have a relatively fresh backup of the
whole partition (via partclone.btrfs), I go on restoring it to
get rid of the errors.
partclone does its thing, the restored partition mounts, text
files are properly readable (!) and btrfs check reports no
errors.
Then on reboot, the kernel (residing on another partition)
instantly crashes: "Input/Output error".
Turns out that when I try to run any binary from the restored
partition (via LiveCD), *every* *single* *one* fails with this
remarkably expressive error. If I manually replace one with a
fresh download, I get a SIGBUS crash instead.
Oh, and upon accessing any of said binaries, dmesg prints a BTRFS
info that csum failed. But only for binaries.
Yay. No idea how to proceed from here, but I guess this might not
necessarily be related to btrfs. Certainly doesn't make me want
to recommend it in the foreseeable future, though.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-25 2:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-21 12:53 bug? Thomas Weber
2012-04-24 15:26 ` bug? Josef Bacik
2012-04-24 15:47 ` bug? Thomas Weber
-- strict thread matches above, loose matches on Subject: below --
2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
2014-04-21 19:13 ` Andreas Reis
2014-04-22 18:16 ` Andreas Reis
2014-04-23 2:55 ` Duncan
2014-04-25 2:04 ` Bug: Andreas Reis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).