* Changing label few times killed filesystem?
@ 2014-11-21 1:27 Boris Chernov
2014-11-21 2:20 ` Chris Murphy
2014-11-21 4:35 ` Roman Mamedov
0 siblings, 2 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-21 1:27 UTC (permalink / raw)
To: linux-btrfs
I have changed file system label few times in total. When I tried
to mount it after that, it became not mountable:
# mount /dev/sdb1 /mnt
mount: Not a directory
In dmesg I see the following after above command:
[ 5198.413202] BTRFS info (device sdb1): disk space caching is enabled
[ 5198.629958] BTRFS: checking UUID tree
I have lots of manually sorted downloaded files on this partition
(in other words nothing very important but downloading and sorting all
files again would require a lot of time), so I would appreciate any
help. This is what I have tried so far to restore it:
# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)'
failed.
zsh: abort btrfs check /dev/sdb1
Since it failed after "checking extents" I decided to try
--init-extent-tree:
# btrfs check --init-extent-tree /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Creating a new extent tree
Failed to find [29376512, 168, 16384]
btrfs unable to find ref byte nr 29376512 parent 0 root 1 owner 1 offset 0
Failed to find [30818304, 168, 16384]
btrfs unable to find ref byte nr 30818304 parent 0 root 1 owner 0 offset 1
Failed to find [47546368, 168, 16384]
btrfs unable to find ref byte nr 47546368 parent 0 root 1 owner 0 offset 1
parent transid verify failed on 29442048 wanted 4 found 2758
Ignoring transid failure
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)'
failed.
zsh: abort btrfs check --init-extent-tree /dev/sdb1
# btrfs restore /dev/sdb1 /media/backup/sdb1 # this commands exits
after a second with 0 return code
# echo $?
0
I also tried btrfs restore with --path-regex and got the same result.
# btrfs-find-root /dev/sdb1
Super think's the tree root is at 29360128, chunk root 20971520
Well block 4194304 seems great, but generation doesn't match, have=2,
want=2759 level 0
Well block 4243456 seems great, but generation doesn't match, have=3,
want=2759 level 0
Found tree root at 29360128 gen 2759 level 1
https://btrfs.wiki.kernel.org/index.php/Restore talks about picking root
with largest transid, but I do not see "transid" in my output, so not
sure what to do.
I also tried btrfsck:
# btrfsck /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop):
0x0000000001074020 ***
zsh: abort btrfsck /dev/sdb1
# btrfsck -b /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop):
0x00000000024e8020 ***
zsh: abort btrfsck -b /dev/sdb1
# btrfsck --repair /dev/sdb1
enabling repair mode
*** Error in `btrfs check': double free or corruption (fasttop):
0x0000000000e26020 ***
zsh: abort btrfsck --repair /dev/sdb1
# uname -a
Linux debian 3.15.0-pf2 #1 SMP Sat Jun 28 15:09:48 EEST 2014 x86_64
GNU/Linux
# btrfs --version
Btrfs v3.14.1
# btrfs fi show
Label: 'label' uuid: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Total devices 1 FS bytes used 411.76GiB
devid 1 size 465.76GiB used 465.76GiB path /dev/sdb1
Btrfs v3.14.1
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Changing label few times killed filesystem? 2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov @ 2014-11-21 2:20 ` Chris Murphy 2014-11-21 11:47 ` Duncan 2014-11-21 4:35 ` Roman Mamedov 1 sibling, 1 reply; 10+ messages in thread From: Chris Murphy @ 2014-11-21 2:20 UTC (permalink / raw) To: Btrfs BTRFS On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com> wrote: > Since it failed after "checking extents" I decided to try > --init-extent-tree: There might be hope yet if you didn't use --repair which is said on the wiki and many times on this list is kindof a last resort. But at the very least before going with the hammer approach you should upgrade your btrfs-progs which is kind old. Current is 3.17.2. I suggest upgrading and just posting the results from 'btrfs check <device>' without any options and see what you get. This check and --repair code are mostly in btrfs-progs, whereas the mount time fixing code is in the kernel. So upgrading btrfs-progs may be sufficient for your case, but ultimately it might be necessary to go to a newer kernel also. > Btrfs v3.14.1 -- Chris Murphy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-21 2:20 ` Chris Murphy @ 2014-11-21 11:47 ` Duncan 0 siblings, 0 replies; 10+ messages in thread From: Duncan @ 2014-11-21 11:47 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Thu, 20 Nov 2014 19:20:22 -0700 as excerpted: > On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com> > wrote: > >> Since it failed after "checking extents" I decided to try >> --init-extent-tree: > > There might be hope yet if you didn't use --repair which is said on the > wiki and many times on this list is kindof a last resort. But at the > very least before going with the hammer approach you should upgrade your > btrfs-progs which is kind old. Current is 3.17.2. I suggest upgrading > and just posting the results from 'btrfs check <device>' without any > options and see what you get. This check and --repair code are mostly in > btrfs-progs, whereas the mount time fixing code is in the kernel. So > upgrading btrfs-progs may be sufficient for your case, but ultimately it > might be necessary to go to a newer kernel also. > >> Btrfs v3.14.1 I'm with Chris here. Additionally, I note that you (OP) are using kernel 3.15.x, while the entire kernel 3.15 series (which wasn't long-term supported so the last kernel update was shortly after 3.16 was released) is effectively blacklisted for btrfs, as it had a major btrfs bug in the compression handling code. (However, if you are not now and never did use compression on that filesystem, that bug shouldn't affect you, but others might.) The same bug was in 3.16.0 and 3.16.1, but was fixed in 3.16.2 (or was it 3.16.3) plus. So later 3.16 series kernels should be reasonably good. Unfortunately, 3.17 added another bug, this time with read-only snapshot handling. I don't do snapshots here and have been running it fine, but you'll want 3.17.2 plus if you do read-only snapshots. I've not yet switched to kernel 3.18 series (late development stage at this point) here, but while there was a problem in rc4, rc5 appears to be good according to reports I've seen. Meanwhile, userspace-side, there have been a number of fixes to btrfs check and the restore code in the 3.16 and 3.17 series, and while running the latest userspace isn't as critical as the kernel for normal operations (online operations) since for them the kernel is the operational code, for fixup (offline operations like btrfs check and btrfs restore), you really do want to be running the latest userspace, because in that case it's the userspace code that's actually doing the work. Meanwhile, in the other subthread you mentioned not understanding transid. FWIW transaction ID and generation are used interchangeably in btrfs discussions and refer to the the same thing -- a monotonically increasing number that gets bumped every time the root tree and superblocks are committed. Normally later generations/transids indicate later commits and thus closer a filesystem state closer to current. Note that you can use btrfs-show-super to display information from the superblocks including what it thinks the current generation/transid should be. Which brings us to the output. In most cases when there's problems with the transid/generation, wanted will be a bit higher than found, something like found 25456, wanted 25459. That simply means that the three latest commits got lost somewhere and you may have to settle for an older one (which is where the wiki restore article you mentioned comes in). But there were a number of reports recently where wanted was *MUCH* *LOWER* than found (like wanted 5, found 2753), which is what you're seeing. Unfortunately I don't remember the resolution of those reports, or indeed, if the bug has been traced yet. There is another bug (or possibly the above was after this one hit if it didn't stop further commits in some cases, thus resetting the generation to zero and increasing it again from there), however, where the transid was being zeroed. Wanted 26473, found 0. One of the devs mentioned tracing that one, tho again I'm not sure current status except that they mentioned it so they're obviously working on it. To my knowledge, these were *NOT* in the context of relabeling, however, so it's quite possible you're seeing the one bug, and the relabeling is simply coincidence. Again, however, you're running a 3.15 kernel that's effectively btrfs blacklisted, and and an even older 3.14 userspace. I can't promise upgrading will give you an actual fix, but certainly, getting current on your kernel and userspace will at least get you on the same page as most folks here, so we know we're not dealing with old and in the case of the kernel known blacklisted versions, and the bugs in play will at least be current ones, not long since fixed ones. And for the kernel, avoid 3.15 series entirely, along with early 3.16 (before 3.16.3) and 3.17 (before 3.17.2), plus early development 3.18 (current rcs should be better). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov 2014-11-21 2:20 ` Chris Murphy @ 2014-11-21 4:35 ` Roman Mamedov 2014-11-21 8:49 ` Boris Chernov 2014-11-23 11:00 ` Boris Chernov 1 sibling, 2 replies; 10+ messages in thread From: Roman Mamedov @ 2014-11-21 4:35 UTC (permalink / raw) To: Boris Chernov; +Cc: linux-btrfs On Fri, 21 Nov 2014 01:27:17 +0000 Boris Chernov <aqs1024@hotmail.com> wrote: > > I have changed file system label few times in total. When I tried > to mount it after that, it became not mountable: > > # mount /dev/sdb1 /mnt > mount: Not a directory I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1. Before mounting try things like "ls -la /mnt/", "umount /mnt", etc. Or simply mounting somewhere else other than /mnt/ -- With respect, Roman ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-21 4:35 ` Roman Mamedov @ 2014-11-21 8:49 ` Boris Chernov 2014-11-23 11:00 ` Boris Chernov 1 sibling, 0 replies; 10+ messages in thread From: Boris Chernov @ 2014-11-21 8:49 UTC (permalink / raw) To: Roman Mamedov; +Cc: linux-btrfs On 2014-11-21 04:35, Roman Mamedov wrote: > On Fri, 21 Nov 2014 01:27:17 +0000 > Boris Chernov <aqs1024@hotmail.com> wrote: >> I have changed file system label few times in total. When I tried >> to mount it after that, it became not mountable: >> >> # mount /dev/sdb1 /mnt >> mount: Not a directory > I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1. > Before mounting try things like "ls -la /mnt/", "umount /mnt", etc. > Or simply mounting somewhere else other than /mnt/ Before I attempted mounting to /mnt I tried to mount with KDE Device Notifier to /media/username/label, then I have tried to create directory manually in /media/ and tried to mount in the command-line, then tried /mnt, and error was the same. So I'm sure there is nothing wrong with my mount points. Now I have rebooted and tried to mount in KDE Device Notifier to /media/username/label, it failed again, so I tried from command-line as root: # mkdir /media/sdb1 && ls -la /media/sdb1 && mount /dev/sdb1 /media/sdb1 total 8 drwxr-sr-x 2 root disk 4096 Nov 21 08:12 . drwsrwsrwT 7 root disk 4096 Nov 21 08:12 .. ...and that's it, no output from mount command (it just hanged and become unkillable process). Please let me know if there is anything else I could try to either restore it or debug it (to at least understand why exactly it screwed up itself so it will not happen again to me or anyone else). If it matters, the disk is with single partition (BTRFS-only), was plugged-in all the time and I use Xeon-based workstation with ECC memory. In the dmesg I see the following, it seems after encountering btrfs bugs in its recovery tools (mentioned in my previous mail) I have also encountered btrfs bug in the kernel: [ 339.349260] BTRFS info (device sdb1): disk space caching is enabled [ 339.397438] parent transid verify failed on 29458432 wanted 5 found 2759 [ 339.397505] ------------[ cut here ]------------ [ 339.397510] kernel BUG at fs/btrfs/locking.c:269! [ 339.397513] invalid opcode: 0000 [#1] SMP [ 339.397517] Modules linked in: ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc snd_aloop snd_hrtimer xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore ac97_bus vmnet(O) parport_pc parport vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats zram nvidia(PO) cfg80211 rfkill binfmt_misc uinput zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc iTCO_wdt iTCO_vendor_support usblp kvm_intel kvm ses enclosure cdc_ether psmouse option i2c_i801 pcspkr usbnet mii usb_wwan usbserial serio_raw i7core_edac edac_core uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media evdev joydev jc42 w83627ehf lm90 coretemp adt7475 hwmon_vid adm1021 ttm drm_kms_helper drm i2c_algo_bit i2c_core msr loop fuse tpm_infineon tpm_tis lpc_ich mfd_core tpm button acpi_cpufreq processor thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq usb_storage sg sd_mod sr_mod cdrom crc_t10dif crct10dif_common hid_generic usbhid hid ahci libahci libata crc32c_intel scsi_mod e1000e ptp pps_core xhci_hcd ehci_pci ehci_hcd usbcore usb_common [last unloaded: vmnet] [ 339.397584] CPU: 0 PID: 25752 Comm: mount Tainted: P O 3.15.0-pf2 #1 [ 339.397585] Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2 08/19/11 [ 339.397586] task: ffff880036c93f80 ti: ffff8805702b4000 task.ti: ffff8805702b4000 [ 339.397587] RIP: 0010:[<ffffffffa0245050>] [<ffffffffa0245050>] btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs] [ 339.397604] RSP: 0018:ffff8805702b7bf0 EFLAGS: 00010246 [ 339.397605] RAX: 0000000000000000 RBX: ffff8804db6da800 RCX: 0000000000000581 [ 339.397606] RDX: 0000000000000000 RSI: ffff8804db58d0e0 RDI: ffff8804db6da800 [ 339.397607] RBP: 0000000000000001 R08: 000000000001b830 R09: ffff88063fc1b830 [ 339.397608] R10: ffff88061afec700 R11: ffffea00136d6300 R12: 0000000000000005 [ 339.397609] R13: ffff88008c978820 R14: ffff88061af51000 R15: ffff8804db6da800 [ 339.397610] FS: 00007f55bf45b840(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000 [ 339.397612] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 339.397613] CR2: 00007f6b280af000 CR3: 00000004da047000 CR4: 00000000000007f0 [ 339.397614] Stack: [ 339.397614] ffffffffa024557d ffff8804db6da800 ffffffffa0208838 0000000000000000 [ 339.397616] 0000000000000000 0000000000000000 0000000000000000 ffff88008c978820 [ 339.397617] ffffffffa02093a0 0000000000001c18 0000000000000005 ffff8804db6da800 [ 339.397619] Call Trace: [ 339.397629] [<ffffffffa024557d>] ? btrfs_tree_read_unlock_blocking+0x8d/0xc0 [btrfs] [ 339.397637] [<ffffffffa0208838>] ? verify_parent_transid+0x118/0x1a0 [btrfs] [ 339.397645] [<ffffffffa02093a0>] ? btree_read_extent_buffer_pages.constprop.46+0xc0/0x110 [btrfs] [ 339.397653] [<ffffffffa020a46e>] ? read_tree_block+0x2e/0x50 [btrfs] [ 339.397662] [<ffffffffa020b90e>] ? btrfs_read_tree_root+0x10e/0x180 [btrfs] [ 339.397670] [<ffffffffa020e745>] ? open_ctree+0x1495/0x1e90 [btrfs] [ 339.397677] [<ffffffffa01e791d>] ? btrfs_mount+0x6bd/0x880 [btrfs] [ 339.397683] [<ffffffff81191f71>] ? mount_fs+0x31/0x1b0 [ 339.397687] [<ffffffff811ac63d>] ? vfs_kern_mount+0x5d/0x110 [ 339.397690] [<ffffffff811aecb5>] ? do_mount+0x225/0xa50 [ 339.397693] [<ffffffff811393b8>] ? memdup_user+0x38/0x70 [ 339.397695] [<ffffffff811af7fb>] ? SyS_mount+0x9b/0x110 [ 339.397698] [<ffffffff814de3f9>] ? system_call_fastpath+0x16/0x1b [ 339.397699] Code: ee e0 b9 ea ff ff ff e9 64 ff ff ff 4c 8b a4 24 90 00 00 00 b9 ea ff ff ff e9 52 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66 [ 339.397715] RIP [<ffffffffa0245050>] btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs] [ 339.397722] RSP <ffff8805702b7bf0> [ 339.397822] ---[ end trace 335f63b7cdc66864 ]--- [ 341.358672] perf interrupt took too long (2508 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-21 4:35 ` Roman Mamedov 2014-11-21 8:49 ` Boris Chernov @ 2014-11-23 11:00 ` Boris Chernov 2014-11-24 2:46 ` Duncan 1 sibling, 1 reply; 10+ messages in thread From: Boris Chernov @ 2014-11-23 11:00 UTC (permalink / raw) To: linux-btrfs > I suggest upgrading and just posting the results from 'btrfs check <device>' > without any options and see what you get. OK, I have upgraded to 3.17.0 kernel and I also have upgraded btrfs-tools: # btrfs --version Btrfs v3.17 # btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed. btrfs[0x41a081] btrfs[0x41a0a5] btrfs[0x409783] btrfs[0x40a45e] btrfs[0x41bfa9] btrfs[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7feaf251cb45] btrfs[0x40b497] btrfsck /dev/sdb1 gives exactly the same output. It seems it does not even try to check anything but just fails on the assertion. I also tried btrfs restore: # btrfs restore /dev/sdb1 /media/backup/sdb1 # Does nothing and exits almost immediately # echo $? 0 After I have upgraded to new kernel, when I try to mount the partition I get this: # mount /dev/sdb1 /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # dmesg | tail ... [ 2505.921545] BTRFS info (device sdb1): disk space caching is enabled [ 2505.925079] parent transid verify failed on 29458432 wanted 5 found 2759 [ 2505.944413] parent transid verify failed on 29458432 wanted 5 found 2759 [ 2505.958450] BTRFS: open_ctree failed > However, if you are not now and never did use compression on that filesystem, > that bug shouldn't affect you, but others might. I did not use compression on this partition, but I have used it on another btrfs disk (which seems to work fine, at least for now). I think I did not use any of special features on the partition I have trouble with (I was planning to, but it died before I got a chance). > it's quite possible you're seeing the one bug, and the relabeling is simply coincidence. I suppose it is possible that something else was the cause, but only other thing I did with the file system at the time was mounting/unmounting it. Also, I did not use it much, just for few weeks, before that the disk was unplugged for a few months (with no files on it). And only things I did with it (before it stopped working) was creating, moving, copying and deleting files. Before upgrading btrfs-tools and the kernel I tried to reproduce the issue by creating big file with btrfs file system, but I was unable to reproduce the problem, but I did not put as much files as on real partition, and it was of a smaller size. In other words, the issue I have encountered seems to be hard to reproduce, so I cannot tell with 100% certainty what exactly caused the corruption. Is there anything else I can try? If not to restore it then to provide more useful debug information (if possible in this case). I could try compiling latest development versions of kernel and/or btrfs-tools if is there a chance that might help? P.S. I received on my mail only shortest reply about "mount" command, so I was able to read other replies only after few days when they appeared on gmane (I wasn't subscribed at the time because I did not expect gmane to be so slow). This time I subscribed to the list so hopefully I will be able to read all replies without delay. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-23 11:00 ` Boris Chernov @ 2014-11-24 2:46 ` Duncan 2014-11-25 11:04 ` Boris Chernov 2014-11-25 16:46 ` Boris Chernov 0 siblings, 2 replies; 10+ messages in thread From: Duncan @ 2014-11-24 2:46 UTC (permalink / raw) To: linux-btrfs Boris Chernov posted on Sun, 23 Nov 2014 11:00:16 +0000 as excerpted: > P.S. I received on my mail only shortest reply about "mount" > command, so I was able to read other replies only after few days when > they appeared on gmane (I wasn't subscribed at the time because I did > not expect gmane to be so slow). This time I subscribed to the list so > hopefully I will be able to read all replies without delay. FWIW I use gmane's list2news service here, and didn't experience such delays (maybe a few hours here and there, but...). However, if you were using gmane's web service, that explains things as weaverd, the process that does the threading on the web side, was down for some days, and Lars (gmane's owner and primary admin, there's others but only Lars is able to do some things) only found out about it when he followed up on a report from someone in gmane.discuss. Check out that list/group for more. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-24 2:46 ` Duncan @ 2014-11-25 11:04 ` Boris Chernov 2014-11-25 16:46 ` Boris Chernov 1 sibling, 0 replies; 10+ messages in thread From: Boris Chernov @ 2014-11-25 11:04 UTC (permalink / raw) To: linux-btrfs On 2014-11-24 02:46, Duncan wrote > if you were using gmane's web service, that explains things as weaverd, the process > that does the threading on the web side, was down for some days Yes, I have used gmane blog. Good to know it is not down anymore. Back on topic. Even after updating to the latest version, btrfsck or any of its options including --repair still do not work. Does anyone know what "Assertion `rec->is_root` failed" means? Is it worth trying to compile my own version of btrfsck without this assertion? With or without --repair option, it looks like this assertion stops btrfsck very early, preventing btrfsck from checking the filesystem or attempting to repair it. # btrfsck /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed. btrfs check[0x41a081] btrfs check[0x41a0a5] btrfs check[0x409783] btrfs check[0x40a45e] btrfs check[0x41bfa9] btrfs check[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb275f24b45] btrfs check[0x40b497] # btrfsck --repair /dev/sdb1 enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed. btrfs check[0x41a081] btrfs check[0x41a0a5] btrfs check[0x409783] btrfs check[0x40a45e] btrfs check[0x41bfa9] btrfs check[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc5b8dab45] btrfs check[0x40b497] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-24 2:46 ` Duncan 2014-11-25 11:04 ` Boris Chernov @ 2014-11-25 16:46 ` Boris Chernov 2014-11-27 18:27 ` Boris Chernov 1 sibling, 1 reply; 10+ messages in thread From: Boris Chernov @ 2014-11-25 16:46 UTC (permalink / raw) To: linux-btrfs In attempt to get more information, I have commented out BUG_ON(rec->is_root) in cmds-check.c to let btrfsck check my file system without failing on this assertion. Below you can see the output. I would appreciate any help or ideas... # btrfsck /dev/sdb1 # Full log can be downloaded here: http://pastebin.com/D68vr69J Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents ... ref mismatch on [20987904 16384] extent item 0, found 1 Backref 20987904 parent 3 root 3 not found in extent tree backpointer mismatch on [20987904 16384] owner ref check failed [20987904 16384] ...messages like these repeat many times, download full log to see them all... ref mismatch on [29540352 16384] extent item 0, found 1 Backref 29540352 parent 18446744073709551607 root 18446744073709551607 not found in extent tree backpointer mismatch on [29540352 16384] owner ref check failed [29540352 16384] ... Errors found in extent allocation tree or chunk allocation checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 root dir 256 not found found 409600 bytes used err is 1 total csum bytes: 0 total tree bytes: 49152 total fs tree bytes: 0 total extent tree bytes: 16384 btree space waste bytes: 48246 file data blocks allocated: 0 referenced 0 Btrfs v3.17 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Changing label few times killed filesystem? 2014-11-25 16:46 ` Boris Chernov @ 2014-11-27 18:27 ` Boris Chernov 0 siblings, 0 replies; 10+ messages in thread From: Boris Chernov @ 2014-11-27 18:27 UTC (permalink / raw) To: linux-btrfs Since nobody had any other suggestions, I decided to attempt to run modified btrfsck with --repair option (without BUG_ON(rec->is_root) assertion). Surprisingly modified btrfsck --repair fixed all errors but one (according to btrfsck), but btrfsck asked me to run btrfsck --repair one more time to fix the remaining error. Mounting still did not work at this point, so I did what btrfsck suggested. At first it said it fixed the remaining error but then it found many more errors (not sure if btrfsck caused them or they were already present and fixing the remaining error just uncovered them). btrfs restore (with or with -t option) returns with zero exit code without even attempting to do anything (like it did before I tried to --repair). Mounting with or without "recovery" option produces the same errors (they were exactly the same before --repair so I already mentioned them in previous message, but for convenience I mention them again in the log below). "btrfs rescue chunk-recover" and "btrfs rescue super-recover" say that everything is OK. Does anybody have any ideas or suggestions? Please do not be afraid to suggest something risky - at this point I have nothing to lose, because if I cannot restore files or provide further debug information for developers, I have to reformat this partition anyway. Ideas what could have caused this corruption are also welcome, because currently I find it hard to believe that relabeling or mounting/unmounting were the only reasons. Below I show what I did exactly and show some parts of terminal output (for readability I removed repeated similar messages, please download full log if you are interested). # btrfsck --repair /dev/sdb1 # Full log is can be downloaded here: http://pastebin.com/MdyjxY4w enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents ref mismatch on [20971520 16384] extent item 0, found 1 adding new tree backref on start 20971520 len 16384 parent 3 root 3 Backref 20971520 parent 3 root 3 not found in extent tree backpointer mismatch on [20971520 16384] ... owner ref check failed [47529984 16384] repaired damaged extent references checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 root dir 256 error ... root 5 inode 5 errors 1, no inode item unresolved ref dir 6 index 0 namelen 7 name default filetype 0 errors 3, no dir item, no dir index Failed to find [30769152, 168, 16384] btrfs unable to find ref byte nr 30769152 parent 0 root 5 owner 0 offset 1 reset isize for dir 6 root 5 root 5 inode 6 errors 2000, link count wrong unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 errors 3, no dir item, no dir index root 5 inode 7 errors 1, no inode item root 5 inode 9 errors 1, no inode item root 5 inode 257 errors 2400, nbytes wrong, link count wrong ... root 5 inode 18446744073709551607 errors 1, no inode item found 409600 bytes used err is 1 total csum bytes: 0 total tree bytes: 49152 total fs tree bytes: 0 total extent tree bytes: 16384 btree space waste bytes: 48246 file data blocks allocated: 0 referenced 0 Btrfs v3.17 To my surprise, btrfsck showed great improvements (after btrfsck --repair) and asked me to run btrfsck --repair one more time to fix remaining error: # btrfsck /dev/sdb1 root item for root 18446744073709551607, current bytenr 29540352, current gen 2758, current level 0, new bytenr 29540352, new gen 4294967296, new level 1 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. Before trying to run btrfsck --repair again, I tried to mount, but it did not work: # mount /dev/sdb1 /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # dmesg | tail ... [268827.386951] BTRFS info (device sdb1): disk space caching is enabled [268827.389932] parent transid verify failed on 29458432 wanted 5 found 2759 [268827.390161] parent transid verify failed on 29458432 wanted 5 found 2759 [268827.405135] BTRFS: open_ctree failed Since btrfsck told me to run it with --repair option again, I did: # btrfsck --repair /dev/sdb1 # Full log is available here: http://pastebin.com/pcWte3Ru enabling repair mode fixing root item for root 18446744073709551607, current bytenr 29540352, current gen 2758, current level 0, new bytenr 29540352, new gen 4294967296, new level 1 Fixed 1 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents parent transid verify failed on 29425664 wanted 1087 found 2763 ... Ignoring transid failure leaf parent key incorrect 29425664 bad block 29425664 Chunk[256, 228, 0]: length(4194304), offset(0), type(2) is not found in block group Chunk[256, 228, 0] stripe[1, 0] is not found in dev extent ... Dev extent's total-byte(0) is not equal to byte-used(500107771904) in dev[1, 216, 1] Errors found in extent allocation tree or chunk allocation checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 root dir 256 error ... root 5 inode 5 errors 1, no inode item unresolved ref dir 6 index 0 namelen 7 name default filetype 0 errors 3, no dir item, no dir index root 5 inode 6 errors 2000, link count wrong unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 errors 3, no dir item, no dir index root 5 inode 7 errors 1, no inode item root 5 inode 9 errors 1, no inode item root 5 inode 257 errors 2400, nbytes wrong, link count wrong ... root 5 inode 18446744073709551607 errors 1, no inode item parent transid verify failed on 29540352 wanted 4294967296 found 2758 parent transid verify failed on 29540352 wanted 4294967296 found 2758 parent transid verify failed on 29540352 wanted 4294967296 found 2758 parent transid verify failed on 29540352 wanted 4294967296 found 2758 Ignoring transid failure found 453869568 bytes used err is 1 total csum bytes: 0 total tree bytes: 1785856 total fs tree bytes: 16384 total extent tree bytes: 16384 btree space waste bytes: 809878 file data blocks allocated: 0 referenced 0 Btrfs v3.17 If I try to mount it again, error in dmesg remains the same as before and btrfsck shows that errors which appeared after second --repair are still present (they can be seen in the log above). I also tried "btrfs rescue" but this did not make any difference (still can't use "btrfs restore" or mount): # btrfs rescue super-recover /dev/sdb1 All supers are valid, no need to recover # btrfs rescue chunk-recover /dev/sdb1 -v # Full log is available here: http://pastebin.com/7knR1afA All Devices: Device: id = 1, name = /dev/sdb1 DEVICE SCAN RESULT: Filesystem Information: sectorsize: 4096 leafsize: 16384 tree root generation: 2765 chunk root generation: 952 ... Bad Chunks: Total Chunks: 469 Heathy: 469 Bad: 0 Orphan Block Groups: Orphan Device Extents: Check chunks successfully with no orphans Recover the chunk tree successfully. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-11-27 18:28 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-21 1:27 Changing label few times killed filesystem? Boris Chernov 2014-11-21 2:20 ` Chris Murphy 2014-11-21 11:47 ` Duncan 2014-11-21 4:35 ` Roman Mamedov 2014-11-21 8:49 ` Boris Chernov 2014-11-23 11:00 ` Boris Chernov 2014-11-24 2:46 ` Duncan 2014-11-25 11:04 ` Boris Chernov 2014-11-25 16:46 ` Boris Chernov 2014-11-27 18:27 ` Boris Chernov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).