* Hitting error after failed balance
@ 2014-01-14 19:47 Mitchel Humpherys
2014-01-14 23:21 ` Mitchel Humpherys
0 siblings, 1 reply; 3+ messages in thread
From: Mitchel Humpherys @ 2014-01-14 19:47 UTC (permalink / raw)
To: linux-btrfs
I have a btrfs volume with two disks in it. Inspired by the recent LWN
series on btrfs, I set it up last week and things seemed to be going
quite well. However, I tried to balance the disks the other day since
they were quite out-of-balance, but the balance job failed to complete,
error'ing out with -ENOSPC (even though there was still space left on
both disks). Things seemed okay for a day or so after that, but now I've
been hitting the following WARN()'ing, pretty much constantly:
[346628.037252] ------------[ cut here ]------------
[346628.037258] WARNING: CPU: 2 PID: 404 at fs/btrfs/super.c:255 __btrfs_abort_transaction+0x11d/0x130 [btrfs]()
[346628.037260] btrfs: Transaction aborted (error -5)
[346628.037261] Modules linked in: arc4 ecb md4 md5 hmac nls_utf8 cifs fuse nfsv3 rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss oid_registry nfs_acl snd_hda_codec_hdmi qmi_wwan cdc_wdm usbnet mii cdc_acm snd_hda_codec_realtek ftdi_sio usbserial snd_hda_intel snd_hda_codec x86_pkg_temp_thermal intel_powerclamp coretemp nouveau kvm_intel kvm mxm_wmi video ttm drm_kms_helper drm snd_hwdep snd_pcm snd_page_alloc snd_timer snd psmouse crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper iTCO_wdt hp_wmi i2c_algo_bit soundcore sparse_keymap e1000e serio_raw cryptd iTCO_vendor_support rfkill gpio_ich ptp evdev pps_core wmi e1000 shpchp i2c_i801 i2c_core microcode processor mei_me mei lpc_ich button pcspkr vboxdrv(O) nfs lockd sunrpc fscache ext4
[346628.037296] crc16 mbcache jbd2 usb_storage btrfs libcrc32c hid_generic usbhid hid xor raid6_pq sr_mod cdrom sd_mod ata_generic crc32c_intel ahci libahci pata_acpi ehci_pci ehci_hcd libata usbcore usb_common scsi_mod
[346628.037307] CPU: 2 PID: 404 Comm: btrfs-transacti Tainted: G W O 3.12.6-1-ARCH #1
[346628.037309] Hardware name: Hewlett-Packard HP Z210 Workstation/1587h, BIOS J51 v01.35 01/10/2012
[346628.037309] 0000000000000009 ffff8800caed3ca8 ffffffff814ee4fb ffff8800caed3cf0
[346628.037312] ffff8800caed3ce0 ffffffff81062bcd 00000000fffffffb ffff880416b13000
[346628.037314] ffff8803d82ce500 ffffffffa02dc280 0000000000000a9b ffff8800caed3d40
[346628.037317] Call Trace:
[346628.037320] [<ffffffff814ee4fb>] dump_stack+0x54/0x8d
[346628.037323] [<ffffffff81062bcd>] warn_slowpath_common+0x7d/0xa0
[346628.037326] [<ffffffff81062c3c>] warn_slowpath_fmt+0x4c/0x50
[346628.037334] [<ffffffffa023fbdd>] __btrfs_abort_transaction+0x11d/0x130 [btrfs]
[346628.037342] [<ffffffffa025a263>] btrfs_run_delayed_refs+0x443/0x550 [btrfs]
[346628.037351] [<ffffffffa026a1ae>] btrfs_commit_transaction+0x4e/0x9d0 [btrfs]
[346628.037359] [<ffffffffa02619ad>] transaction_kthread+0x19d/0x220 [btrfs]
[346628.037367] [<ffffffffa0261810>] ? free_fs_root+0xc0/0xc0 [btrfs]
[346628.037370] [<ffffffff81084fe0>] kthread+0xc0/0xd0
[346628.037373] [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
[346628.037375] [<ffffffff814fcffc>] ret_from_fork+0x7c/0xb0
[346628.037378] [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
[346628.037379] ---[ end trace fe83d0a80efc9fc0 ]---
Larger kernel log (including the part where the balance job failed)
here: https://gist.github.com/mgalgs/8423964
Rebooting didn't clear things up. I also tried mounting with
skip_balance, but still get the same error. I'm not sure if the balance
is actually related, but something is very unhappy here.
Any ideas what's going on here? Is this salvageable?
Other possibly relevant information:
$ sudo btrfs filesystem show /local
Label: none uuid: 03a83a42-0bc7-42a2-bed6-df19c825897c
Total devices 2 FS bytes used 380.83GiB
devid 1 size 410.18GiB used 164.03GiB path /dev/sda6
devid 2 size 465.76GiB used 220.00GiB path /dev/sdc
Btrfs v3.12
$ uname -a
Linux mitchelh-linux 3.12.6-1-ARCH #1 SMP PREEMPT Fri Dec 20 19:39:00 CET 2013 x86_64 GNU/Linux
/dev/sda6 started out as an ext4 partition that I converted with
btrfs-convert. /dev/sdc was fresh. I've been using /dev/sda for over 1.5
years without issues, /dev/sdc is new so it could be a wildcard.
Let me know if you need any more information.
Please Cc me on replies since I'm not subscribed to this list.
Cool filesystem, by the way :). I can't say I've been excited about a
filesystem since playing around with ZFS on FreeBSD, but btrfs is pretty
awesome. The user interface is great.
Thanks!
--
Mitch
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Hitting error after failed balance
2014-01-14 19:47 Hitting error after failed balance Mitchel Humpherys
@ 2014-01-14 23:21 ` Mitchel Humpherys
2014-01-15 17:19 ` Duncan
0 siblings, 1 reply; 3+ messages in thread
From: Mitchel Humpherys @ 2014-01-14 23:21 UTC (permalink / raw)
To: linux-btrfs
On Tue, Jan 14 2014 at 11:47:18 AM, Mitchel Humpherys <mitch.special@gmail.com> wrote:
>
> Other possibly relevant information:
>
> $ sudo btrfs filesystem show /local
> Label: none uuid: 03a83a42-0bc7-42a2-bed6-df19c825897c
> Total devices 2 FS bytes used 380.83GiB
> devid 1 size 410.18GiB used 164.03GiB path /dev/sda6
> devid 2 size 465.76GiB used 220.00GiB path /dev/sdc
>
> Btrfs v3.12
>
> $ uname -a
> Linux mitchelh-linux 3.12.6-1-ARCH #1 SMP PREEMPT Fri Dec 20 19:39:00 CET 2013 x86_64 GNU/Linux
>
Well I tried btrfsck --repair and now it appears to be unmountable...
$ sudo btrfsck --repair /dev/sda6
enabling repair mode
Checking filesystem on /dev/sda6
UUID: 03a83a42-0bc7-42a2-bed6-df19c825897c
checking extents
Backref 122643533824 parent 5 root 5 not found in extent tree
Backref 122643533824 parent 123939635200 not referenced back 0x55ef990
Incorrect global backref count on 122643533824 found 2 wanted 1
backpointer mismatch on [122643533824 4096]
...
Incorrect global backref count on 846095245312 found 1 wanted 0
backpointer mismatch on [846095245312 4096]
Backref 846103539712 parent 2 root 2 not found in extent tree
Backref 846103539712 root 2 not referenced back 0x15d4460
Incorrect global backref count on 846103539712 found 1 wanted 0
backpointer mismatch on [846103539712 4096]
repaired damaged extent references
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 256 inode 257 errors 800, odd csum item
found 43901995420 bytes used err is 1
total csum bytes: 363657120
total tree bytes: 9200095232
total fs tree bytes: 8303808512
total extent tree bytes: 473149440
btree space waste bytes: 2280314010
file data blocks allocated: 864793686016
referenced 757795065856
Btrfs v3.12
$ sudo mount -a
mount: wrong fs type, bad option, bad superblock on /dev/sda6,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
fstab entry:
/dev/sda6 /local btrfs skip_balance 0 0
and here's the kernel log from the mount -a:
[13462.391937] btrfs: device fsid 03a83a42-0bc7-42a2-bed6-df19c825897c devid 2 transid 95625 /dev/sdc
[13462.532136] btrfs: device fsid 03a83a42-0bc7-42a2-bed6-df19c825897c devid 1 transid 95625 /dev/sda6
[13479.393750] btrfs: device fsid 03a83a42-0bc7-42a2-bed6-df19c825897c devid 1 transid 95625 /dev/sda6
[13479.394386] btrfs: disk space caching is enabled
[13479.995184] parent transid verify failed on 625292959744 wanted 95626 found 95625
[13479.995191] btrfs: failed to read log tree
[13480.051334] btrfs: open_ctree failed
> wanted 95626 found 95625
So close! :)
Am I completely hosed?
--
Mitch
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Hitting error after failed balance
2014-01-14 23:21 ` Mitchel Humpherys
@ 2014-01-15 17:19 ` Duncan
0 siblings, 0 replies; 3+ messages in thread
From: Duncan @ 2014-01-15 17:19 UTC (permalink / raw)
To: linux-btrfs
Mitchel Humpherys posted on Tue, 14 Jan 2014 15:21:19 -0800 as excerpted:
> On Tue, Jan 14 2014 at 11:47:18 AM, Mitchel Humpherys
> <mitch.special@gmail.com> wrote:
>>
>> Btrfs v3.12
>>
>> $ uname -a
>> Linux mitchelh-linux 3.12.6-1-ARCH #1 SMP PREEMPT Fri
>> Dec 20 19:39:00 CET 2013 x86_64 GNU/Linux
>>
> Well I tried btrfsck --repair and now it appears to be unmountable...
[If you wish to continue being direct-mailed as well, please keep that
request in every reply, as I read/reply-to this list as a newsgroup via
gmane.org, and in my news client replying via email is an extra step, so
I don't do it by default but do try to honor requests when I see 'em.]
You're aware that btrfsck --repair is considered a last-ditch option,
right?
In some cases it makes the problems worse, not better. There are other
things to be tried first, and when it comes down to btrfsck --repair, if
the filesystem is mountable as yours was, the recommendation is to update
your backups if you need to (since btrfs is considered testing and still
under development, you should have routine/tested backups any time you're
testing it anyway, thus it's update an existing backup, not make your
FIRST one =:^), test that you can recover from the backup in case the
btrfsck --repair makes things worse, and /then/ try the repair.
> $ sudo mount -a
> mount: wrong fs type, bad option, bad superblock on
> /dev/sda6, missing codepage or helper program, or other error
> and here's the kernel log from the mount -a:
>
> [13479.995191] btrfs: failed to read log tree
> [13480.051334]
> btrfs: open_ctree failed
There's the btrfs-zero-log tool. When it's saying the log can't be read,
that's the thing to try. You will lose the last few seconds of
transactions from the log since the last commit, but btrfs is designed to
maintain a consistent commit-state, so in theory, anyway, when the log is
corrupt and can't be used anyway, zeroing it should at least get you back
to a consistent filesystem state.
Tho I'm not sure what further damage btrfsck --repair might have done or
what further damage you may have had from the failed balance and
previous. But hopefully without the log, the filesystem is at least
mountable.
> So close! :)
>
> Am I completely hosed?
If zero-log doesn't work, there's still hope. You can try btrfs-find-
root and btrfs restore, using the instructions here:
https://btrfs.wiki.kernel.org/index.php/Restore
In general, if you haven't already, I'd suggest spending some time
reading the documentation on the wiki. You can also read some of the
back-list posts right here, say from gmane.org, since not all the common
wisdom on the list may be on the wiki just yet. I use the nntp/news
interface, but there's also a web interface (actually multiple web
interfaces, take your pick =:^), if you're not familiar with nntp or
simply don't want to bother setting up a proper nntp client.
If that doesn't work either, well, it may be time to mkfs.btrfs and fall
back to the backups.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-01-15 17:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-14 19:47 Hitting error after failed balance Mitchel Humpherys
2014-01-14 23:21 ` Mitchel Humpherys
2014-01-15 17:19 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox