linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bug?
@ 2012-04-21 12:53 Thomas Weber
  2012-04-24 15:26 ` bug? Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Weber @ 2012-04-21 12:53 UTC (permalink / raw)
  To: linux-btrfs

Hello,

today my laptop crashed with the following output. Installed is
Archlinux with btrfs on a SSD.
Is it btrfs related?

Thanks,
Thomas


Apr 21 13:01:01 localhost anacron[3307]: Anacron started on 2012-04-21
Apr 21 13:01:01 localhost anacron[3307]: Will run job `cron.daily' in 48
min.
Apr 21 13:01:01 localhost anacron[3307]: Jobs will be executed sequentially
Apr 21 13:21:01 localhost -- MARK --
Apr 21 13:41:01 localhost -- MARK --
Apr 21 13:49:01 localhost anacron[3307]: Job `cron.daily' started
Apr 21 13:49:14 localhost kernel: [23420.297861] general protection
fault: 0000 [#1] PREEMPT SMP
Apr 21 13:49:14 localhost kernel: [23420.297976] CPU 1
Apr 21 13:49:14 localhost kernel: [23420.298007] Modules linked in:
nls_cp437 vfat fat usb_storage uas aes_x86_64 cryptd aes_generic fuse
ext4 jbd2 mbcache crc16 joydev arc4 dell_wmi sparse_keymap i915
snd_hda_codec_idt iwl3945 dell_laptop dcdbas iwl_legacy mac80211
snd_hda_intel i2c_algo_bit snd_hda_codec drm_kms_helper evdev snd_hwdep
snd_pcm drm serio_raw psmouse pcspkr tg3 snd_page_alloc cfg80211
snd_timer i2c_i801 iTCO_wdt iTCO_vendor_support snd i2c_core libphy
rfkill soundcore intel_agp intel_gtt wmi thermal button battery video
processor ac btrfs crc32c libcrc32c zlib_deflate sr_mod cdrom sd_mod
pata_acpi ata_generic ata_piix libata scsi_mod ehci_hcd uhci_hcd usbcore
usb_common
Apr 21 13:49:14 localhost kernel: [23420.299233]
Apr 21 13:49:14 localhost kernel: [23420.299262] Pid: 11172, comm:
updatedb Not tainted 3.2.11-1-ARCH #1 Dell Inc. Latitude
D530                   /0HP728
Apr 21 13:49:14 localhost kernel: [23420.299410] RIP:
0010:[<ffffffffa0180ddd>]  [<ffffffffa0180ddd>] btrfs_getattr+0x3d/0x90
[btrfs]
Apr 21 13:49:14 localhost kernel: [23420.299560] RSP:
0018:ffff880077765e38  EFLAGS: 00010206
Apr 21 13:49:14 localhost kernel: [23420.299630] RAX: 41d700000000fffe
RBX: ffff8800bf6c1550 RCX: 000000000000000c
Apr 21 13:49:14 localhost kernel: [23420.299717] RDX: ffff880077765f00
RSI: ffff880077765f00 RDI: ffff8800bf6c1550
Apr 21 13:49:14 localhost kernel: [23420.299804] RBP: ffff880077765e58
R08: ffffffff81173373 R09: ffff8800ba43bcf8
Apr 21 13:49:14 localhost kernel: [23420.299891] R10: ffff8800ba43bcc0
R11: 0000000000000005 R12: ffff880077765f00
Apr 21 13:49:14 localhost kernel: [23420.299978] R13: 0000000000001000
R14: ffff880077765f00 R15: 0000000001e54c50
Apr 21 13:49:14 localhost kernel: [23420.300067] FS: 
00007f3e42b8f700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300168] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Apr 21 13:49:14 localhost kernel: [23420.300240] CR2: 0000000001e71ffc
CR3: 0000000058685000 CR4: 00000000000006e0
Apr 21 13:49:14 localhost kernel: [23420.300327] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300414] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 21 13:49:14 localhost kernel: [23420.300502] Process updatedb (pid:
11172, threadinfo ffff880077764000, task ffff88008a8023a0)
Apr 21 13:49:14 localhost kernel: [23420.300603] Stack:
Apr 21 13:49:14 localhost kernel: [23420.300635]  ffff880077765f68
ffff8800ba43bcc0 ffff88011707bd00 ffff8800bf6c1550
Apr 21 13:49:14 localhost kernel: [23420.300753]  ffff880077765e98
ffffffff8116c51e ffff880077765f00 0000000001e38319
Apr 21 13:49:14 localhost kernel: [23420.300869]  ffff880077765f00
0000000001e38319 00007fffd821ca18 0000000000000000
Apr 21 13:49:14 localhost kernel: [23420.300985] Call Trace:
Apr 21 13:49:14 localhost kernel: [23420.301001]  [<ffffffff8116c51e>]
vfs_getattr+0x4e/0x80
Apr 21 13:49:14 localhost kernel: [23420.301001]  [<ffffffff8116c59e>]
vfs_fstatat+0x4e/0x70
Apr 21 13:49:14 localhost kernel: [23420.301001]  [<ffffffff8116c5de>]
vfs_lstat+0x1e/0x20
Apr 21 13:49:14 localhost kernel: [23420.301001]  [<ffffffff8116c77a>]
sys_newlstat+0x1a/0x40
Apr 21 13:49:14 localhost kernel: [23420.301001]  [<ffffffff8145ddc2>]
system_call_fastpath+0x16/0x1b
Apr 21 13:49:14 localhost kernel: [23420.301001] Code: 6d f8 66 66 66 66
90 48 8b 5e 30 48 89 d6 49 89 d4 48 8b 43 28 48 89 df 44 8b 68 18 e8 5d
b2 fe e0 48 8b 83 60 fe ff ff 48 89 df <8b> 80 00 04 00 00 49 c7 44 24
58 00 10 00 00 41 89 44 24 08 e8
Apr 21 13:49:14 localhost kernel: [23420.301001] RIP 
[<ffffffffa0180ddd>] btrfs_getattr+0x3d/0x90 [btrfs]
Apr 21 13:49:14 localhost kernel: [23420.301001]  RSP <ffff880077765e38>
Apr 21 13:49:14 localhost anacron[3307]: Job `cron.daily' terminated
(exit status: 1) (mailing output)
Apr 21 13:49:14 localhost anacron[3307]: Can't find sendmail at
/usr/sbin/sendmail, not mailing output
Apr 21 13:49:14 localhost anacron[3307]: Normal exit (1 job run)
Apr 21 13:49:14 localhost kernel: [23420.365362] ---[ end trace
27dae2a049083cf1 ]---


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug?
  2012-04-21 12:53 bug? Thomas Weber
@ 2012-04-24 15:26 ` Josef Bacik
  2012-04-24 15:47   ` bug? Thomas Weber
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2012-04-24 15:26 UTC (permalink / raw)
  To: Thomas Weber; +Cc: linux-btrfs

On Sat, Apr 21, 2012 at 02:53:55PM +0200, Thomas Weber wrote:
> Hello,
> 
> today my laptop crashed with the following output. Installed is
> Archlinux with btrfs on a SSD.
> Is it btrfs related?

Sort of an old kernel, can you try on something recent?  It doesn't look
familiar but who knows.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug?
  2012-04-24 15:26 ` bug? Josef Bacik
@ 2012-04-24 15:47   ` Thomas Weber
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Weber @ 2012-04-24 15:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Thomas Weber, linux-btrfs

Hello Josef,

On 04/24/2012 05:26 PM, Josef Bacik wrote:
> On Sat, Apr 21, 2012 at 02:53:55PM +0200, Thomas Weber wrote:
>> Hello,
>>
>> today my laptop crashed with the following output. Installed is
>> Archlinux with btrfs on a SSD.
>> Is it btrfs related?
> Sort of an old kernel, can you try on something recent?  It doesn't look
> familiar but who knows.  Thanks,
>
> Josef
I was on the 3.2 kernel because of the enospc problem. Today I updated
to 3.3.3 kernel.

Thomas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
@ 2014-04-21 16:16 Andreas Reis
  2014-04-21 19:13 ` Andreas Reis
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Reis @ 2014-04-21 16:16 UTC (permalink / raw)
  To: linux-btrfs

Kernel 3.15.0-rc2, btrfs-progs 3.14.1

While doing some minor package updates my btrfs root partition [*] 
decided to corrupt itself. There was no system crash, although I had 
plenty of these (due to an USB-related regression) in recent weeks that 
resulted in no trouble.

First only one of a package's folders was corrupted, any access to files 
within (incl. attempts to delete) printed

btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88

to dmesg (I'm actually not sure about the numbers, but that was indeed 
the error message). After moving the folder out of the way the partition 
continued to appear working as normal, one reboot also worked fine.

Now I can't boot at all (beyond loading the kernel image located on 
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to 
mount the __current/ROOT subvolume on ArchLinux's current Live-CD 
(kernel 3.13.7) prints

btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan 
cleanup
BTRFS critical (device sdc5): could not do orphan cleanup -22

Doing "btrfs check /dev/sdc5" merely first prints ten

free space inode generation (0) did not match free space cache 
generation ([different transids between 40010 and 55578])

to then abort with

checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino != 
key->objectid || rec->refs > 1)' failed.

I'm reluctant to try any of "btrfs check" options (or mount with -o 
recovery) since the last three times I did this (with other partitions) 
it resulted in the partition becoming entirely trashed, while before at 
least "btrfs restore" still managed to extract some data each time.

The affected folder was one within /usr/include/qt4 (which I then moved 
to /usr/BROKEN, to successfully reinstall the package), ie. on the 
__current/ROOT subvolume.

Which seems the only subvolume affected (yet). Mounting & accessing the 
other three (__current/{var,home,opt}) still works.

[*] Organised following 
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html

(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
  2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
@ 2014-04-21 19:13 ` Andreas Reis
  2014-04-21 23:44   ` Duncan
  2014-04-22 18:16   ` Andreas Reis
  0 siblings, 2 replies; 12+ messages in thread
From: Andreas Reis @ 2014-04-21 19:13 UTC (permalink / raw)
  To: linux-btrfs

Alright, turns out the partition does actually mount on 3.15-rc2 (error 
messages remain, of course).

But systemd will fail to continue booting as /bin/mount returns "exit 
status 32" and / thus ends as ro, yet can be manually remounted as rw.

Another error message I've spotted with 3.15 is

BTRFS error (device sdc5): error loading props for ino 1810424 (root 
257): -5

I've now tried to mount with -o recovery and clear_cache, no effect.

On 21.04.2014 18:16, Andreas Reis wrote:
> Kernel 3.15.0-rc2, btrfs-progs 3.14.1
>
> While doing some minor package updates my btrfs root partition [*]
> decided to corrupt itself. There was no system crash, although I had
> plenty of these (due to an USB-related regression) in recent weeks that
> resulted in no trouble.
>
> First only one of a package's folders was corrupted, any access to files
> within (incl. attempts to delete) printed
>
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
>
> to dmesg (I'm actually not sure about the numbers, but that was indeed
> the error message). After moving the folder out of the way the partition
> continued to appear working as normal, one reboot also worked fine.
>
> Now I can't boot at all (beyond loading the kernel image located on
> another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
> mount the __current/ROOT subvolume on ArchLinux's current Live-CD
> (kernel 3.13.7) prints
>
> btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
> btrfs: use ssd allocation scheme
> btrfs: disk space caching is enabled
> btrfs: checking UUID tree
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
> BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
> cleanup
> BTRFS critical (device sdc5): could not do orphan cleanup -22
>
> Doing "btrfs check /dev/sdc5" merely first prints ten
>
> free space inode generation (0) did not match free space cache
> generation ([different transids between 40010 and 55578])
>
> to then abort with
>
> checking fs roots
> btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec->ino !=
> key->objectid || rec->refs > 1)' failed.
>
> I'm reluctant to try any of "btrfs check" options (or mount with -o
> recovery) since the last three times I did this (with other partitions)
> it resulted in the partition becoming entirely trashed, while before at
> least "btrfs restore" still managed to extract some data each time.
>
> The affected folder was one within /usr/include/qt4 (which I then moved
> to /usr/BROKEN, to successfully reinstall the package), ie. on the
> __current/ROOT subvolume.
>
> Which seems the only subvolume affected (yet). Mounting & accessing the
> other three (__current/{var,home,opt}) still works.
>
> [*] Organised following
> http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html
>
> (Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
  2014-04-21 19:13 ` Andreas Reis
@ 2014-04-21 23:44   ` Duncan
  2014-04-22 18:16   ` Andreas Reis
  1 sibling, 0 replies; 12+ messages in thread
From: Duncan @ 2014-04-21 23:44 UTC (permalink / raw)
  To: linux-btrfs

Andreas Reis posted on Mon, 21 Apr 2014 21:13:16 +0200 as excerpted:

> Alright, turns out the partition does actually mount on 3.15-rc2 (error
> messages remain, of course).
> 
> But systemd will fail to continue booting as /bin/mount returns "exit
> status 32" and / thus ends as ro, yet can be manually remounted as rw.

The mount manpage says status 32 is mount failure.  Dmesg should contain 
more, but that's probably the errors you already mentioned.

So you're getting the read-only mount, but can't remount rw.

(This doesn't apply in your case, but FWIW, I now have my root filesystem 
setup to be ro mounted by default, and have been running that way for 
some months, now.  Seems safer that way.  The only time I remount / rw is 
when I'm updating the system or changing something in the config, then I 
normally remount ro again, altho after updating the system I normally 
have to exit and restart X and kde as well as various system services 
before I can remount ro, depending on what libraries got changed out from 
under my running processes.  Of course in ordered to make this work a 
few /var/ subdirs that need to be writable are actually symlinks to
/home/var/ subdirs, /var/log is a dedicated writable logging partition of 
its own, etc.  So a read-only rootfs is the /normal/ case for me, and 
wouldn't interfere with normal operations at all. =:^)

> Another error message I've spotted with 3.15 is
> 
> BTRFS error (device sdc5): error loading props for ino 1810424 (root
> 257): -5

That would be one of the new btrfs properties introduced in kernel 3.14.  
See btrfs property list/get/set...  Unless you've set individual file 
properties (such as compress), that's probably a property (such as ro/rw) 
on a subvolume, or possibly on the main filesystem (label, etc).

Meanwhile, "orphans" normally refer to files that are deleted while 
they're still in use.  Normally, these will be libraries, etc, replaced 
during a system upgrade, but still in use by running programs.  Once all 
such running programs have been restarted (loading the new version of the 
library) or terminated, the filesystem can be unmounted or remounted read-
only.  In the event they're not fully cleaned up at umount time, they are 
normally cleaned up after reboot, when a filesystem is first mounted 
writable once again.

Obviously there's a problem with one of these orphans, and attempts to 
clean it up are failing, causing the remount rw to fail.

While that doesn't help with fixing the problem, it should at least give 
you some idea of what's going on, and how to interpret the messages and 
errors you see.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
  2014-04-21 19:13 ` Andreas Reis
  2014-04-21 23:44   ` Duncan
@ 2014-04-22 18:16   ` Andreas Reis
  2014-04-23  2:55     ` Duncan
  2014-04-23 15:02     ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
  1 sibling, 2 replies; 12+ messages in thread
From: Andreas Reis @ 2014-04-22 18:16 UTC (permalink / raw)
  To: linux-btrfs

Same failure with btrfs-progs from integration-20140421 (apart from the 
line number 1156).

Can I get a bit of input on this? Is it safe to just ignore the error 
for now (as I'm doing atm), ie. remount as rw to skip the orphan cleanup?

Might it even be safe to call btrfs check --repair on the partition? I'm 
not keen on that failing mid-process at the same assertion and thus 
breaking it over a bunch of minor files, just like it happened with my 
previous btrfs partitions.

On 21.04.2014 21:13, Andreas Reis wrote:
> Alright, turns out the partition does actually mount on 3.15-rc2 (error
> messages remain, of course).
>
> But systemd will fail to continue booting as /bin/mount returns "exit
> status 32" and / thus ends as ro, yet can be manually remounted as rw.
>
> Another error message I've spotted with 3.15 is
>
> BTRFS error (device sdc5): error loading props for ino 1810424 (root
> 257): -5
>
> I've now tried to mount with -o recovery and clear_cache, no effect.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
  2014-04-22 18:16   ` Andreas Reis
@ 2014-04-23  2:55     ` Duncan
  2014-04-25  2:04       ` Bug: Andreas Reis
  2014-04-23 15:02     ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
  1 sibling, 1 reply; 12+ messages in thread
From: Duncan @ 2014-04-23  2:55 UTC (permalink / raw)
  To: linux-btrfs

Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:

> Same failure with btrfs-progs from integration-20140421 (apart from the
> line number 1156).
> 
> Can I get a bit of input on this? Is it safe to just ignore the error
> for now (as I'm doing atm), ie. remount as rw to skip the orphan
> cleanup?

I explained orphans in my other reply.  Since they're simply not yet 
completed file deletions, it should be /relatively/ safe to continue 
ignoring and doing the manual remount rw, since that continues to work.

"Relatively" as in that's what I'd do in the shorter term here were I 
seeing the problem, tho I'd ensure my backups were current and tested, as 
should be the case on btrfs anyway since it's not entirely stable yet, 
and just because I don't like nagging half-dealt-with-problems left 
laying around and the error would eat at me until I'd cleared it, at some 
point likely rather sooner than later, I'd very likely mkfs and restore 
from those backups.  But I'd certainly be willing to continue running 
from the partition short term, for a week or so until I had a chance to 
do the mkfs.btrfs and restore from backup, as long as that remained the 
only issue I was seeing.

> Might it even be safe to call btrfs check --repair on the partition? I'm
> not keen on that failing mid-process at the same assertion and thus
> breaking it over a bunch of minor files, just like it happened with my
> previous btrfs partitions.

That I can't say.  Based on reports and the common knowledge of the list, 
I've become rather leery of btrfs check --repair myself, and tend to rely 
on scrub and balance to fix issues if they can, and beyond that, 
mkfs.btrfs and restore from backup.  In fact, while btrfs check without 
the --repair is safe as it's read-only, I don't run it regularly either, 
because I know should it report problems I'd then be worried about things 
I might have no reasonable way to fix, that obviously aren't causing me 
problems anyway.  Basically, if mounting and regular use of the 
filesystem isn't giving me anything unusual in dmesg, I consider it good, 
and I for the most part I tend to route around btrfs check entirely, as 
if it weren't even there, tho I've run it in default read-only mode a few 
times, to compare my output with a post from the list or something, 
always with a clean bill of health from btrfs check when I have run it.

That said, if you have backups tested and ready anyway, and would 
otherwise be doing a mkfs.btrfs in short order in ordered to get rid of 
those bad orphan warnings anyway, I don't see the harm in running it, 
since at that point it's zero risk anyway.  If you lose the filesystem as 
a result, big deal, as you were going to mkfs.btrfs and restore from 
backup anyway, and if it fixes the problem, well, you saved yourself the 
hassle.

Plus, either way you can report back the results and then we'll know 
whether it's safe to recommend btrfs check for the next report, or not. 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes
  2014-04-22 18:16   ` Andreas Reis
  2014-04-23  2:55     ` Duncan
@ 2014-04-23 15:02     ` Andreas Reis
  1 sibling, 0 replies; 12+ messages in thread
From: Andreas Reis @ 2014-04-23 15:02 UTC (permalink / raw)
  To: linux-btrfs

Ah. Thank you for the replies. I didn't get them as mails and spinics 
didn't update the thread until yesterday.

So I take it that the recommended course of action is not to wait for 
any more or less unlikely btrfs-progs fix, but to try --repair and be 
ready to restore from backup, too. Darn, and that over what probably 
doesn't amount to more than a few dozen KB. Wish I could simply replace 
the single subvolume instead, but I suppose that's one of btrfs's drawbacks.

I did a full partition backup some three weeks ago, so I'll have to 
spend some hours to figure out what has changed since then, and how to 
do incremental backups of it to different devices for the next time…

I don't have the time atm though; it'll probably take at least a week 
(unless the partition decides to die) to report back.

As a side note, there was an ostensibly similar issue fixed in 2012: 
https://bugzilla.novell.com/show_bug.cgi?id=760279 Guess that was a 
different underlying issue, though.

Duncan posted on Wed, 23 Apr 2014 02:55:36 +0000:

 > Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:
 >
 > > Same failure with btrfs-progs from integration-20140421 (apart from
 > > the line number 1156).
 > >
 > > Can I get a bit of input on this? Is it safe to just ignore the
 > > error for now (as I'm doing atm), ie. remount as rw to skip the
 > > orphan cleanup?
 >
 > I explained orphans in my other reply.  Since they're simply not yet
 > completed file deletions, it should be /relatively/ safe to continue
 > ignoring and doing the manual remount rw, since that continues to
 > kwork.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug:
  2014-04-23  2:55     ` Duncan
@ 2014-04-25  2:04       ` Andreas Reis
  2014-04-25  2:43         ` Bug: Partition borked Andreas Reis
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Reis @ 2014-04-25  2:04 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan <at> cox.net> writes:

> Plus, either way you can report back the results and then we'll 
know 
> whether it's safe to recommend btrfs check for the next report, 
or not. 
> =:^)

Well this is just bloody brilliant.

I did btrfs check --repair with from integration and a bunch of 
fixes on this list applied. Failed at the same assert, but 
otherwise left the partition unchanged, ie. mountable.

So as planned, thinking I have a relatively fresh backup of the 
whole partition (via partclone.btrfs), I go on restoring it to 
get rid of the errors.

partclone does its thing, the restored partition mounts, text 
files are properly readable (!) and btrfs check reports no 
errors.

Then on reboot, the kernel (residing on another partition) 
instantly crashes: "Input/Output error".

Turns out that when I try to run any binary from the restored 
partition (via LiveCD), *every* *single* *one* fails with this 
remarkably expressive error. If I manually replace one with a 
fresh download, I get a SIGBUS crash instead.

Oh, and upon accessing any of said binaries, dmesg prints a BTRFS 
info that csum failed. But only for binaries.

Yay. No idea how to proceed from here, but I guess this might not 
necessarily be related to btrfs. Certainly doesn't make me want 
to recommend it in the foreseeable future, though.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: Partition borked
  2014-04-25  2:04       ` Bug: Andreas Reis
@ 2014-04-25  2:43         ` Andreas Reis
  2014-04-25  3:03           ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Reis @ 2014-04-25  2:43 UTC (permalink / raw)
  To: linux-btrfs

Andreas Reis <andreas.reis <at> gmail.com> writes:

> Turns out that when I try to run any binary from the restored 
> partition (via LiveCD), *every* *single* *one* fails with this 
> remarkably expressive error. If I manually replace one with a 
> fresh download, I get a SIGBUS crash instead.

Alright, there are corrupt text files too, after all. As well as a 
handful or non-corrupted binaries.

Always the same type of btrfs error message though. Interestingly, 
the false csum reported stays exactly the same: 2566472073. Also, 
btrfs check --init-csum-tree fails with a plethora of backref 
errors.

Guess it doesn't matter whether it's the backup or the LiveCD's 
kernel (3.13.7) that's at fault, I'm going to have to reinstall 
either way.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: Partition borked
  2014-04-25  2:43         ` Bug: Partition borked Andreas Reis
@ 2014-04-25  3:03           ` Chris Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Murphy @ 2014-04-25  3:03 UTC (permalink / raw)
  To: Andreas Reis; +Cc: linux-btrfs


On Apr 24, 2014, at 8:43 PM, Andreas Reis <andreas.reis@gmail.com> wrote:

> Andreas Reis <andreas.reis <at> gmail.com> writes:
> 
>> Turns out that when I try to run any binary from the restored 
>> partition (via LiveCD), *every* *single* *one* fails with this 
>> remarkably expressive error. If I manually replace one with a 
>> fresh download, I get a SIGBUS crash instead.
> 
> Alright, there are corrupt text files too, after all. As well as a 
> handful or non-corrupted binaries.
> 
> Always the same type of btrfs error message though. Interestingly, 
> the false csum reported stays exactly the same: 2566472073. Also, 
> btrfs check --init-csum-tree fails with a plethora of backref 
> errors.

That command obliterates the csum tree. csums are not recomputed for already written files. Anytime you read an existing file, e.g. merely copy it, you'll get a long pile of csum errors because there's missing csums.

btrfs check itself is benign, but the options --init* and --repair have been fairly vertical fixes for specific problems and can make others worse; although that experience is largely based on older progs. I'm not sure yet how well 3.14 is repairing, and haven't looked at the changelog to see if btrfsck has been significantly updated in it.

Chris Murphy


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-25  3:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-21 16:16 Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
2014-04-21 19:13 ` Andreas Reis
2014-04-21 23:44   ` Duncan
2014-04-22 18:16   ` Andreas Reis
2014-04-23  2:55     ` Duncan
2014-04-25  2:04       ` Bug: Andreas Reis
2014-04-25  2:43         ` Bug: Partition borked Andreas Reis
2014-04-25  3:03           ` Chris Murphy
2014-04-23 15:02     ` Bug: "corrupt leaf. slot offset bad": root subvolume unmountable, "btrfs check" crashes Andreas Reis
  -- strict thread matches above, loose matches on Subject: below --
2012-04-21 12:53 bug? Thomas Weber
2012-04-24 15:26 ` bug? Josef Bacik
2012-04-24 15:47   ` bug? Thomas Weber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).