* Changing label few times killed filesystem?
@ 2014-11-21 1:27 Boris Chernov
2014-11-21 2:20 ` Chris Murphy
2014-11-21 4:35 ` Roman Mamedov
0 siblings, 2 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-21 1:27 UTC (permalink / raw)
To: linux-btrfs
I have changed file system label few times in total. When I tried
to mount it after that, it became not mountable:
# mount /dev/sdb1 /mnt
mount: Not a directory
In dmesg I see the following after above command:
[ 5198.413202] BTRFS info (device sdb1): disk space caching is enabled
[ 5198.629958] BTRFS: checking UUID tree
I have lots of manually sorted downloaded files on this partition
(in other words nothing very important but downloading and sorting all
files again would require a lot of time), so I would appreciate any
help. This is what I have tried so far to restore it:
# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)'
failed.
zsh: abort btrfs check /dev/sdb1
Since it failed after "checking extents" I decided to try
--init-extent-tree:
# btrfs check --init-extent-tree /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Creating a new extent tree
Failed to find [29376512, 168, 16384]
btrfs unable to find ref byte nr 29376512 parent 0 root 1 owner 1 offset 0
Failed to find [30818304, 168, 16384]
btrfs unable to find ref byte nr 30818304 parent 0 root 1 owner 0 offset 1
Failed to find [47546368, 168, 16384]
btrfs unable to find ref byte nr 47546368 parent 0 root 1 owner 0 offset 1
parent transid verify failed on 29442048 wanted 4 found 2758
Ignoring transid failure
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)'
failed.
zsh: abort btrfs check --init-extent-tree /dev/sdb1
# btrfs restore /dev/sdb1 /media/backup/sdb1 # this commands exits
after a second with 0 return code
# echo $?
0
I also tried btrfs restore with --path-regex and got the same result.
# btrfs-find-root /dev/sdb1
Super think's the tree root is at 29360128, chunk root 20971520
Well block 4194304 seems great, but generation doesn't match, have=2,
want=2759 level 0
Well block 4243456 seems great, but generation doesn't match, have=3,
want=2759 level 0
Found tree root at 29360128 gen 2759 level 1
https://btrfs.wiki.kernel.org/index.php/Restore talks about picking root
with largest transid, but I do not see "transid" in my output, so not
sure what to do.
I also tried btrfsck:
# btrfsck /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop):
0x0000000001074020 ***
zsh: abort btrfsck /dev/sdb1
# btrfsck -b /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop):
0x00000000024e8020 ***
zsh: abort btrfsck -b /dev/sdb1
# btrfsck --repair /dev/sdb1
enabling repair mode
*** Error in `btrfs check': double free or corruption (fasttop):
0x0000000000e26020 ***
zsh: abort btrfsck --repair /dev/sdb1
# uname -a
Linux debian 3.15.0-pf2 #1 SMP Sat Jun 28 15:09:48 EEST 2014 x86_64
GNU/Linux
# btrfs --version
Btrfs v3.14.1
# btrfs fi show
Label: 'label' uuid: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Total devices 1 FS bytes used 411.76GiB
devid 1 size 465.76GiB used 465.76GiB path /dev/sdb1
Btrfs v3.14.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov
@ 2014-11-21 2:20 ` Chris Murphy
2014-11-21 11:47 ` Duncan
2014-11-21 4:35 ` Roman Mamedov
1 sibling, 1 reply; 10+ messages in thread
From: Chris Murphy @ 2014-11-21 2:20 UTC (permalink / raw)
To: Btrfs BTRFS
On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com> wrote:
> Since it failed after "checking extents" I decided to try
> --init-extent-tree:
There might be hope yet if you didn't use --repair which is said on
the wiki and many times on this list is kindof a last resort. But at
the very least before going with the hammer approach you should
upgrade your btrfs-progs which is kind old. Current is 3.17.2. I
suggest upgrading and just posting the results from 'btrfs check
<device>' without any options and see what you get. This check and
--repair code are mostly in btrfs-progs, whereas the mount time fixing
code is in the kernel. So upgrading btrfs-progs may be sufficient for
your case, but ultimately it might be necessary to go to a newer
kernel also.
> Btrfs v3.14.1
--
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov
2014-11-21 2:20 ` Chris Murphy
@ 2014-11-21 4:35 ` Roman Mamedov
2014-11-21 8:49 ` Boris Chernov
2014-11-23 11:00 ` Boris Chernov
1 sibling, 2 replies; 10+ messages in thread
From: Roman Mamedov @ 2014-11-21 4:35 UTC (permalink / raw)
To: Boris Chernov; +Cc: linux-btrfs
On Fri, 21 Nov 2014 01:27:17 +0000
Boris Chernov <aqs1024@hotmail.com> wrote:
>
> I have changed file system label few times in total. When I tried
> to mount it after that, it became not mountable:
>
> # mount /dev/sdb1 /mnt
> mount: Not a directory
I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1.
Before mounting try things like "ls -la /mnt/", "umount /mnt", etc.
Or simply mounting somewhere else other than /mnt/
--
With respect,
Roman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-21 4:35 ` Roman Mamedov
@ 2014-11-21 8:49 ` Boris Chernov
2014-11-23 11:00 ` Boris Chernov
1 sibling, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-21 8:49 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 2014-11-21 04:35, Roman Mamedov wrote:
> On Fri, 21 Nov 2014 01:27:17 +0000
> Boris Chernov <aqs1024@hotmail.com> wrote:
>> I have changed file system label few times in total. When I tried
>> to mount it after that, it became not mountable:
>>
>> # mount /dev/sdb1 /mnt
>> mount: Not a directory
> I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1.
> Before mounting try things like "ls -la /mnt/", "umount /mnt", etc.
> Or simply mounting somewhere else other than /mnt/
Before I attempted mounting to /mnt I tried to mount with KDE
Device Notifier to /media/username/label, then I have tried to create
directory manually in /media/ and tried to mount in the command-line,
then tried /mnt, and error was the same. So I'm sure there is nothing
wrong with my mount points.
Now I have rebooted and tried to mount in KDE Device Notifier to
/media/username/label, it failed again, so I tried from command-line as
root:
# mkdir /media/sdb1 && ls -la /media/sdb1 && mount /dev/sdb1 /media/sdb1
total 8
drwxr-sr-x 2 root disk 4096 Nov 21 08:12 .
drwsrwsrwT 7 root disk 4096 Nov 21 08:12 ..
...and that's it, no output from mount command (it just hanged and
become unkillable process). Please let me know if there is anything else
I could try to either restore it or debug it (to at least understand why
exactly it screwed up itself so it will not happen again to me or anyone
else). If it matters, the disk is with single partition (BTRFS-only),
was plugged-in all the time and I use Xeon-based workstation with ECC
memory. In the dmesg I see the following, it seems after encountering
btrfs bugs in its recovery tools (mentioned in my previous mail) I have
also encountered btrfs bug in the kernel:
[ 339.349260] BTRFS info (device sdb1): disk space caching is enabled
[ 339.397438] parent transid verify failed on 29458432 wanted 5 found 2759
[ 339.397505] ------------[ cut here ]------------
[ 339.397510] kernel BUG at fs/btrfs/locking.c:269!
[ 339.397513] invalid opcode: 0000 [#1] SMP
[ 339.397517] Modules linked in: ppp_deflate bsd_comp ppp_async
crc_ccitt ppp_generic slhc snd_aloop snd_hrtimer xt_conntrack
iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables
snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 snd_i2c snd_ice17xx_ak4xxx
snd_ak4xxx_adda snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device
snd_timer snd soundcore ac97_bus vmnet(O) parport_pc parport
vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) cpufreq_conservative
cpufreq_powersave cpufreq_userspace cpufreq_stats zram nvidia(PO)
cfg80211 rfkill binfmt_misc uinput zfs(PO) zunicode(PO) zavl(PO)
zcommon(PO) znvpair(PO) spl(O) nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd fscache sunrpc iTCO_wdt iTCO_vendor_support usblp kvm_intel kvm
ses enclosure cdc_ether psmouse option i2c_i801 pcspkr usbnet mii
usb_wwan usbserial serio_raw i7core_edac edac_core uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media evdev
joydev jc42 w83627ehf lm90 coretemp adt7475 hwmon_vid adm1021 ttm
drm_kms_helper drm i2c_algo_bit i2c_core msr loop fuse tpm_infineon
tpm_tis lpc_ich mfd_core tpm button acpi_cpufreq processor thermal_sys
autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq usb_storage sg sd_mod
sr_mod cdrom crc_t10dif crct10dif_common hid_generic usbhid hid ahci
libahci libata crc32c_intel scsi_mod e1000e ptp pps_core xhci_hcd
ehci_pci ehci_hcd usbcore usb_common [last unloaded: vmnet]
[ 339.397584] CPU: 0 PID: 25752 Comm: mount Tainted: P O
3.15.0-pf2 #1
[ 339.397585] Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2
08/19/11
[ 339.397586] task: ffff880036c93f80 ti: ffff8805702b4000 task.ti:
ffff8805702b4000
[ 339.397587] RIP: 0010:[<ffffffffa0245050>] [<ffffffffa0245050>]
btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs]
[ 339.397604] RSP: 0018:ffff8805702b7bf0 EFLAGS: 00010246
[ 339.397605] RAX: 0000000000000000 RBX: ffff8804db6da800 RCX:
0000000000000581
[ 339.397606] RDX: 0000000000000000 RSI: ffff8804db58d0e0 RDI:
ffff8804db6da800
[ 339.397607] RBP: 0000000000000001 R08: 000000000001b830 R09:
ffff88063fc1b830
[ 339.397608] R10: ffff88061afec700 R11: ffffea00136d6300 R12:
0000000000000005
[ 339.397609] R13: ffff88008c978820 R14: ffff88061af51000 R15:
ffff8804db6da800
[ 339.397610] FS: 00007f55bf45b840(0000) GS:ffff88063fc00000(0000)
knlGS:0000000000000000
[ 339.397612] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 339.397613] CR2: 00007f6b280af000 CR3: 00000004da047000 CR4:
00000000000007f0
[ 339.397614] Stack:
[ 339.397614] ffffffffa024557d ffff8804db6da800 ffffffffa0208838
0000000000000000
[ 339.397616] 0000000000000000 0000000000000000 0000000000000000
ffff88008c978820
[ 339.397617] ffffffffa02093a0 0000000000001c18 0000000000000005
ffff8804db6da800
[ 339.397619] Call Trace:
[ 339.397629] [<ffffffffa024557d>] ?
btrfs_tree_read_unlock_blocking+0x8d/0xc0 [btrfs]
[ 339.397637] [<ffffffffa0208838>] ? verify_parent_transid+0x118/0x1a0
[btrfs]
[ 339.397645] [<ffffffffa02093a0>] ?
btree_read_extent_buffer_pages.constprop.46+0xc0/0x110 [btrfs]
[ 339.397653] [<ffffffffa020a46e>] ? read_tree_block+0x2e/0x50 [btrfs]
[ 339.397662] [<ffffffffa020b90e>] ? btrfs_read_tree_root+0x10e/0x180
[btrfs]
[ 339.397670] [<ffffffffa020e745>] ? open_ctree+0x1495/0x1e90 [btrfs]
[ 339.397677] [<ffffffffa01e791d>] ? btrfs_mount+0x6bd/0x880 [btrfs]
[ 339.397683] [<ffffffff81191f71>] ? mount_fs+0x31/0x1b0
[ 339.397687] [<ffffffff811ac63d>] ? vfs_kern_mount+0x5d/0x110
[ 339.397690] [<ffffffff811aecb5>] ? do_mount+0x225/0xa50
[ 339.397693] [<ffffffff811393b8>] ? memdup_user+0x38/0x70
[ 339.397695] [<ffffffff811af7fb>] ? SyS_mount+0x9b/0x110
[ 339.397698] [<ffffffff814de3f9>] ? system_call_fastpath+0x16/0x1b
[ 339.397699] Code: ee e0 b9 ea ff ff ff e9 64 ff ff ff 4c 8b a4 24 90
00 00 00 b9 ea ff ff ff e9 52 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66
[ 339.397715] RIP [<ffffffffa0245050>]
btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs]
[ 339.397722] RSP <ffff8805702b7bf0>
[ 339.397822] ---[ end trace 335f63b7cdc66864 ]---
[ 341.358672] perf interrupt took too long (2508 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-21 2:20 ` Chris Murphy
@ 2014-11-21 11:47 ` Duncan
0 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2014-11-21 11:47 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Thu, 20 Nov 2014 19:20:22 -0700 as excerpted:
> On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com>
> wrote:
>
>> Since it failed after "checking extents" I decided to try
>> --init-extent-tree:
>
> There might be hope yet if you didn't use --repair which is said on the
> wiki and many times on this list is kindof a last resort. But at the
> very least before going with the hammer approach you should upgrade your
> btrfs-progs which is kind old. Current is 3.17.2. I suggest upgrading
> and just posting the results from 'btrfs check <device>' without any
> options and see what you get. This check and --repair code are mostly in
> btrfs-progs, whereas the mount time fixing code is in the kernel. So
> upgrading btrfs-progs may be sufficient for your case, but ultimately it
> might be necessary to go to a newer kernel also.
>
>> Btrfs v3.14.1
I'm with Chris here.
Additionally, I note that you (OP) are using kernel 3.15.x, while the
entire kernel 3.15 series (which wasn't long-term supported so the last
kernel update was shortly after 3.16 was released) is effectively
blacklisted for btrfs, as it had a major btrfs bug in the compression
handling code. (However, if you are not now and never did use
compression on that filesystem, that bug shouldn't affect you, but others
might.) The same bug was in 3.16.0 and 3.16.1, but was fixed in 3.16.2
(or was it 3.16.3) plus. So later 3.16 series kernels should be
reasonably good.
Unfortunately, 3.17 added another bug, this time with read-only snapshot
handling. I don't do snapshots here and have been running it fine, but
you'll want 3.17.2 plus if you do read-only snapshots.
I've not yet switched to kernel 3.18 series (late development stage at
this point) here, but while there was a problem in rc4, rc5 appears to be
good according to reports I've seen.
Meanwhile, userspace-side, there have been a number of fixes to btrfs
check and the restore code in the 3.16 and 3.17 series, and while running
the latest userspace isn't as critical as the kernel for normal
operations (online operations) since for them the kernel is the
operational code, for fixup (offline operations like btrfs check and
btrfs restore), you really do want to be running the latest userspace,
because in that case it's the userspace code that's actually doing the
work.
Meanwhile, in the other subthread you mentioned not understanding
transid. FWIW transaction ID and generation are used interchangeably in
btrfs discussions and refer to the the same thing -- a monotonically
increasing number that gets bumped every time the root tree and
superblocks are committed. Normally later generations/transids indicate
later commits and thus closer a filesystem state closer to current. Note
that you can use btrfs-show-super to display information from the
superblocks including what it thinks the current generation/transid
should be.
Which brings us to the output. In most cases when there's problems with
the transid/generation, wanted will be a bit higher than found, something
like found 25456, wanted 25459. That simply means that the three latest
commits got lost somewhere and you may have to settle for an older one
(which is where the wiki restore article you mentioned comes in).
But there were a number of reports recently where wanted was *MUCH*
*LOWER* than found (like wanted 5, found 2753), which is what you're
seeing. Unfortunately I don't remember the resolution of those reports,
or indeed, if the bug has been traced yet.
There is another bug (or possibly the above was after this one hit if it
didn't stop further commits in some cases, thus resetting the generation
to zero and increasing it again from there), however, where the transid
was being zeroed. Wanted 26473, found 0. One of the devs mentioned
tracing that one, tho again I'm not sure current status except that they
mentioned it so they're obviously working on it.
To my knowledge, these were *NOT* in the context of relabeling, however,
so it's quite possible you're seeing the one bug, and the relabeling is
simply coincidence.
Again, however, you're running a 3.15 kernel that's effectively btrfs
blacklisted, and and an even older 3.14 userspace. I can't promise
upgrading will give you an actual fix, but certainly, getting current on
your kernel and userspace will at least get you on the same page as most
folks here, so we know we're not dealing with old and in the case of the
kernel known blacklisted versions, and the bugs in play will at least be
current ones, not long since fixed ones. And for the kernel, avoid 3.15
series entirely, along with early 3.16 (before 3.16.3) and 3.17 (before
3.17.2), plus early development 3.18 (current rcs should be better).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-21 4:35 ` Roman Mamedov
2014-11-21 8:49 ` Boris Chernov
@ 2014-11-23 11:00 ` Boris Chernov
2014-11-24 2:46 ` Duncan
1 sibling, 1 reply; 10+ messages in thread
From: Boris Chernov @ 2014-11-23 11:00 UTC (permalink / raw)
To: linux-btrfs
> I suggest upgrading and just posting the results from 'btrfs check
<device>'
> without any options and see what you get.
OK, I have upgraded to 3.17.0 kernel and I also have upgraded
btrfs-tools:
# btrfs --version
Btrfs v3.17
# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs[0x41a081]
btrfs[0x41a0a5]
btrfs[0x409783]
btrfs[0x40a45e]
btrfs[0x41bfa9]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7feaf251cb45]
btrfs[0x40b497]
btrfsck /dev/sdb1 gives exactly the same output. It seems it does
not even try to check anything but just fails on the assertion. I also
tried btrfs restore:
# btrfs restore /dev/sdb1 /media/backup/sdb1 # Does nothing and exits
almost immediately
# echo $?
0
After I have upgraded to new kernel, when I try to mount the
partition I get this:
# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
# dmesg | tail
...
[ 2505.921545] BTRFS info (device sdb1): disk space caching is enabled
[ 2505.925079] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.944413] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.958450] BTRFS: open_ctree failed
> However, if you are not now and never did use compression on that
filesystem,
> that bug shouldn't affect you, but others might.
I did not use compression on this partition, but I have used it on
another btrfs disk (which seems to work fine, at least for now). I think
I did not use any of special features on the partition I have trouble
with (I was planning to, but it died before I got a chance).
> it's quite possible you're seeing the one bug, and the relabeling is
simply coincidence.
I suppose it is possible that something else was the cause, but
only other thing I did with the file system at the time was
mounting/unmounting it. Also, I did not use it much, just for few weeks,
before that the disk was unplugged for a few months (with no files on
it). And only things I did with it (before it stopped working) was
creating, moving, copying and deleting files.
Before upgrading btrfs-tools and the kernel I tried to reproduce
the issue by creating big file with btrfs file system, but I was unable
to reproduce the problem, but I did not put as much files as on real
partition, and it was of a smaller size. In other words, the issue I
have encountered seems to be hard to reproduce, so I cannot tell with
100% certainty what exactly caused the corruption.
Is there anything else I can try? If not to restore it then to
provide more useful debug information (if possible in this case). I
could try compiling latest development versions of kernel and/or
btrfs-tools if is there a chance that might help?
P.S. I received on my mail only shortest reply about "mount"
command, so I was able to read other replies only after few days when
they appeared on gmane (I wasn't subscribed at the time because I did
not expect gmane to be so slow). This time I subscribed to the list so
hopefully I will be able to read all replies without delay.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-23 11:00 ` Boris Chernov
@ 2014-11-24 2:46 ` Duncan
2014-11-25 11:04 ` Boris Chernov
2014-11-25 16:46 ` Boris Chernov
0 siblings, 2 replies; 10+ messages in thread
From: Duncan @ 2014-11-24 2:46 UTC (permalink / raw)
To: linux-btrfs
Boris Chernov posted on Sun, 23 Nov 2014 11:00:16 +0000 as excerpted:
> P.S. I received on my mail only shortest reply about "mount"
> command, so I was able to read other replies only after few days when
> they appeared on gmane (I wasn't subscribed at the time because I did
> not expect gmane to be so slow). This time I subscribed to the list so
> hopefully I will be able to read all replies without delay.
FWIW I use gmane's list2news service here, and didn't experience such
delays (maybe a few hours here and there, but...).
However, if you were using gmane's web service, that explains things as
weaverd, the process that does the threading on the web side, was down
for some days, and Lars (gmane's owner and primary admin, there's others
but only Lars is able to do some things) only found out about it when he
followed up on a report from someone in gmane.discuss. Check out that
list/group for more.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-24 2:46 ` Duncan
@ 2014-11-25 11:04 ` Boris Chernov
2014-11-25 16:46 ` Boris Chernov
1 sibling, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-25 11:04 UTC (permalink / raw)
To: linux-btrfs
On 2014-11-24 02:46, Duncan wrote
> if you were using gmane's web service, that explains things as
weaverd, the process
> that does the threading on the web side, was down for some days
Yes, I have used gmane blog. Good to know it is not down anymore.
Back on topic. Even after updating to the latest version, btrfsck
or any of its options including --repair still do not work. Does anyone
know what "Assertion `rec->is_root` failed" means? Is it worth trying to
compile my own version of btrfsck without this assertion?
With or without --repair option, it looks like this assertion stops
btrfsck very early, preventing btrfsck from checking the filesystem or
attempting to repair it.
# btrfsck /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb275f24b45]
btrfs check[0x40b497]
# btrfsck --repair /dev/sdb1
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc5b8dab45]
btrfs check[0x40b497]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-24 2:46 ` Duncan
2014-11-25 11:04 ` Boris Chernov
@ 2014-11-25 16:46 ` Boris Chernov
2014-11-27 18:27 ` Boris Chernov
1 sibling, 1 reply; 10+ messages in thread
From: Boris Chernov @ 2014-11-25 16:46 UTC (permalink / raw)
To: linux-btrfs
In attempt to get more information, I have commented out
BUG_ON(rec->is_root) in cmds-check.c to let btrfsck check my file system
without failing on this assertion. Below you can see the output. I would
appreciate any help or ideas...
# btrfsck /dev/sdb1 # Full log can be downloaded here:
http://pastebin.com/D68vr69J
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
...
ref mismatch on [20987904 16384] extent item 0, found 1
Backref 20987904 parent 3 root 3 not found in extent tree
backpointer mismatch on [20987904 16384]
owner ref check failed [20987904 16384]
...messages like these repeat many times, download full log to see them
all...
ref mismatch on [29540352 16384] extent item 0, found 1
Backref 29540352 parent 18446744073709551607 root 18446744073709551607
not found in extent tree
backpointer mismatch on [29540352 16384]
owner ref check failed [29540352 16384]
...
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 not found
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
referenced 0
Btrfs v3.17
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem?
2014-11-25 16:46 ` Boris Chernov
@ 2014-11-27 18:27 ` Boris Chernov
0 siblings, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-27 18:27 UTC (permalink / raw)
To: linux-btrfs
Since nobody had any other suggestions, I decided to attempt to run
modified btrfsck with --repair option (without BUG_ON(rec->is_root)
assertion).
Surprisingly modified btrfsck --repair fixed all errors but one
(according to btrfsck), but btrfsck asked me to run btrfsck --repair one
more time to fix the remaining error. Mounting still did not work at
this point, so I did what btrfsck suggested. At first it said it fixed
the remaining error but then it found many more errors (not sure if
btrfsck caused them or they were already present and fixing the
remaining error just uncovered them).
btrfs restore (with or with -t option) returns with zero exit code
without even attempting to do anything (like it did before I tried to
--repair). Mounting with or without "recovery" option produces the same
errors (they were exactly the same before --repair so I already
mentioned them in previous message, but for convenience I mention them
again in the log below). "btrfs rescue chunk-recover" and "btrfs rescue
super-recover" say that everything is OK.
Does anybody have any ideas or suggestions?
Please do not be afraid to suggest something risky - at this point
I have nothing to lose, because if I cannot restore files or provide
further debug information for developers, I have to reformat this
partition anyway. Ideas what could have caused this corruption are also
welcome, because currently I find it hard to believe that relabeling or
mounting/unmounting were the only reasons.
Below I show what I did exactly and show some parts of terminal
output (for readability I removed repeated similar messages, please
download full log if you are interested).
# btrfsck --repair /dev/sdb1 # Full log is can be downloaded here:
http://pastebin.com/MdyjxY4w
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
ref mismatch on [20971520 16384] extent item 0, found 1
adding new tree backref on start 20971520 len 16384 parent 3 root 3
Backref 20971520 parent 3 root 3 not found in extent tree
backpointer mismatch on [20971520 16384]
...
owner ref check failed [47529984 16384]
repaired damaged extent references
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 error
...
root 5 inode 5 errors 1, no inode item
unresolved ref dir 6 index 0 namelen 7 name default filetype 0
errors 3, no dir item, no dir index
Failed to find [30769152, 168, 16384]
btrfs unable to find ref byte nr 30769152 parent 0 root 5 owner 0 offset 1
reset isize for dir 6 root 5
root 5 inode 6 errors 2000, link count wrong
unresolved ref dir 6 index 0 namelen 2 name .. filetype 0
errors 3, no dir item, no dir index
root 5 inode 7 errors 1, no inode item
root 5 inode 9 errors 1, no inode item
root 5 inode 257 errors 2400, nbytes wrong, link count wrong
...
root 5 inode 18446744073709551607 errors 1, no inode item
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
referenced 0
Btrfs v3.17
To my surprise, btrfsck showed great improvements (after btrfsck
--repair) and asked me to run btrfsck --repair one more time to fix
remaining error:
# btrfsck /dev/sdb1
root item for root 18446744073709551607, current bytenr 29540352,
current gen 2758, current level 0, new bytenr 29540352, new gen
4294967296, new level 1
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.
Before trying to run btrfsck --repair again, I tried to mount, but
it did not work:
# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
# dmesg | tail
...
[268827.386951] BTRFS info (device sdb1): disk space caching is enabled
[268827.389932] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.390161] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.405135] BTRFS: open_ctree failed
Since btrfsck told me to run it with --repair option again, I did:
# btrfsck --repair /dev/sdb1 # Full log is available here:
http://pastebin.com/pcWte3Ru
enabling repair mode
fixing root item for root 18446744073709551607, current bytenr 29540352,
current gen 2758, current level 0, new bytenr 29540352, new gen
4294967296, new level 1
Fixed 1 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
parent transid verify failed on 29425664 wanted 1087 found 2763
...
Ignoring transid failure
leaf parent key incorrect 29425664
bad block 29425664
Chunk[256, 228, 0]: length(4194304), offset(0), type(2) is not found in
block group
Chunk[256, 228, 0] stripe[1, 0] is not found in dev extent
...
Dev extent's total-byte(0) is not equal to byte-used(500107771904) in
dev[1, 216, 1]
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 error
...
root 5 inode 5 errors 1, no inode item
unresolved ref dir 6 index 0 namelen 7 name default filetype 0
errors 3, no dir item, no dir index
root 5 inode 6 errors 2000, link count wrong
unresolved ref dir 6 index 0 namelen 2 name .. filetype 0
errors 3, no dir item, no dir index
root 5 inode 7 errors 1, no inode item
root 5 inode 9 errors 1, no inode item
root 5 inode 257 errors 2400, nbytes wrong, link count wrong
...
root 5 inode 18446744073709551607 errors 1, no inode item
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
Ignoring transid failure
found 453869568 bytes used err is 1
total csum bytes: 0
total tree bytes: 1785856
total fs tree bytes: 16384
total extent tree bytes: 16384
btree space waste bytes: 809878
file data blocks allocated: 0
referenced 0
Btrfs v3.17
If I try to mount it again, error in dmesg remains the same as
before and btrfsck shows that errors which appeared after second
--repair are still present (they can be seen in the log above). I also
tried "btrfs rescue" but this did not make any difference (still can't
use "btrfs restore" or mount):
# btrfs rescue super-recover /dev/sdb1
All supers are valid, no need to recover
# btrfs rescue chunk-recover /dev/sdb1 -v # Full log is available here:
http://pastebin.com/7knR1afA
All Devices:
Device: id = 1, name = /dev/sdb1
DEVICE SCAN RESULT:
Filesystem Information:
sectorsize: 4096
leafsize: 16384
tree root generation: 2765
chunk root generation: 952
...
Bad Chunks:
Total Chunks: 469
Heathy: 469
Bad: 0
Orphan Block Groups:
Orphan Device Extents:
Check chunks successfully with no orphans
Recover the chunk tree successfully.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-11-27 18:28 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov
2014-11-21 2:20 ` Chris Murphy
2014-11-21 11:47 ` Duncan
2014-11-21 4:35 ` Roman Mamedov
2014-11-21 8:49 ` Boris Chernov
2014-11-23 11:00 ` Boris Chernov
2014-11-24 2:46 ` Duncan
2014-11-25 11:04 ` Boris Chernov
2014-11-25 16:46 ` Boris Chernov
2014-11-27 18:27 ` Boris Chernov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).