Changing label few times killed filesystem?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Changing label few times killed filesystem?
@ 2014-11-21  1:27 Boris Chernov
  2014-11-21  2:20 ` Chris Murphy
  2014-11-21  4:35 ` Roman Mamedov
  0 siblings, 2 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-21  1:27 UTC (permalink / raw)
  To: linux-btrfs

     I have changed file system label few times in total. When I tried 
to mount it after that, it became not mountable:

# mount /dev/sdb1 /mnt
mount: Not a directory

     In dmesg I see the following after above command:

[ 5198.413202] BTRFS info (device sdb1): disk space caching is enabled
[ 5198.629958] BTRFS: checking UUID tree

     I have lots of manually sorted downloaded files on this partition 
(in other words nothing very important but downloading and sorting all 
files again would require a lot of time), so I would appreciate any 
help.  This is what I have tried so far to restore it:

# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)' 
failed.
zsh: abort      btrfs check /dev/sdb1

     Since it failed after "checking extents" I decided to try 
--init-extent-tree:

# btrfs check --init-extent-tree /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Creating a new extent tree
Failed to find [29376512, 168, 16384]
btrfs unable to find ref byte nr 29376512 parent 0 root 1  owner 1 offset 0
Failed to find [30818304, 168, 16384]
btrfs unable to find ref byte nr 30818304 parent 0 root 1  owner 0 offset 1
Failed to find [47546368, 168, 16384]
btrfs unable to find ref byte nr 47546368 parent 0 root 1  owner 0 offset 1
parent transid verify failed on 29442048 wanted 4 found 2758
Ignoring transid failure
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec->is_root)' 
failed.
zsh: abort      btrfs check --init-extent-tree /dev/sdb1

# btrfs restore /dev/sdb1 /media/backup/sdb1  # this commands exits 
after a second with 0 return code
# echo $?
0

     I also tried btrfs restore with --path-regex and got the same result.

# btrfs-find-root /dev/sdb1
Super think's the tree root is at 29360128, chunk root 20971520
Well block 4194304 seems great, but generation doesn't match, have=2, 
want=2759 level 0
Well block 4243456 seems great, but generation doesn't match, have=3, 
want=2759 level 0
Found tree root at 29360128 gen 2759 level 1

https://btrfs.wiki.kernel.org/index.php/Restore talks about picking root 
with largest transid, but I do not see "transid" in my output, so not 
sure what to do.

     I also tried btrfsck:

# btrfsck /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop): 
0x0000000001074020 ***
zsh: abort      btrfsck /dev/sdb1

# btrfsck -b /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop): 
0x00000000024e8020 ***
zsh: abort      btrfsck -b /dev/sdb1

# btrfsck --repair /dev/sdb1
enabling repair mode
*** Error in `btrfs check': double free or corruption (fasttop): 
0x0000000000e26020 ***
zsh: abort      btrfsck --repair /dev/sdb1

# uname -a
Linux debian 3.15.0-pf2 #1 SMP Sat Jun 28 15:09:48 EEST 2014 x86_64 
GNU/Linux
# btrfs --version
Btrfs v3.14.1
# btrfs fi show
Label: 'label'  uuid: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
         Total devices 1 FS bytes used 411.76GiB
         devid    1 size 465.76GiB used 465.76GiB path /dev/sdb1

Btrfs v3.14.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-21  1:27 Changing label few times killed filesystem? Boris Chernov
@ 2014-11-21  2:20 ` Chris Murphy
  2014-11-21 11:47   ` Duncan
  2014-11-21  4:35 ` Roman Mamedov
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Murphy @ 2014-11-21  2:20 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com> wrote:

>     Since it failed after "checking extents" I decided to try
> --init-extent-tree:

There might be hope yet if you didn't use --repair which is said on
the wiki and many times on this list is kindof a last resort. But at
the very least before going with the hammer approach you should
upgrade your btrfs-progs which is kind old. Current is 3.17.2. I
suggest upgrading and just posting the results from 'btrfs check
<device>' without any options and see what you get. This check and
--repair code are mostly in btrfs-progs, whereas the mount time fixing
code is in the kernel. So upgrading btrfs-progs may be sufficient for
your case, but ultimately it might be necessary to go to a newer
kernel also.

> Btrfs v3.14.1


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-21  2:20 ` Chris Murphy
@ 2014-11-21 11:47   ` Duncan
  0 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2014-11-21 11:47 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 20 Nov 2014 19:20:22 -0700 as excerpted:

> On Thu, Nov 20, 2014 at 6:27 PM, Boris Chernov <aqs1024@hotmail.com>
> wrote:
> 
>>     Since it failed after "checking extents" I decided to try
>> --init-extent-tree:
> 
> There might be hope yet if you didn't use --repair which is said on the
> wiki and many times on this list is kindof a last resort. But at the
> very least before going with the hammer approach you should upgrade your
> btrfs-progs which is kind old. Current is 3.17.2. I suggest upgrading
> and just posting the results from 'btrfs check <device>' without any
> options and see what you get. This check and --repair code are mostly in
> btrfs-progs, whereas the mount time fixing code is in the kernel. So
> upgrading btrfs-progs may be sufficient for your case, but ultimately it
> might be necessary to go to a newer kernel also.
> 
>> Btrfs v3.14.1

I'm with Chris here.

Additionally, I note that you (OP) are using kernel 3.15.x, while the 
entire kernel 3.15 series (which wasn't long-term supported so the last 
kernel update was shortly after 3.16 was released) is effectively 
blacklisted for btrfs, as it had a major btrfs bug in the compression 
handling code.  (However, if you are not now and never did use 
compression on that filesystem, that bug shouldn't affect you, but others 
might.)  The same bug was in 3.16.0 and 3.16.1, but was fixed in 3.16.2 
(or was it 3.16.3) plus.  So later 3.16 series kernels should be 
reasonably good.

Unfortunately, 3.17 added another bug, this time with read-only snapshot 
handling.  I don't do snapshots here and have been running it fine, but 
you'll want 3.17.2 plus if you do read-only snapshots.

I've not yet switched to kernel 3.18 series (late development stage at 
this point) here, but while there was a problem in rc4, rc5 appears to be 
good according to reports I've seen.

Meanwhile, userspace-side, there have been a number of fixes to btrfs 
check and the restore code in the 3.16 and 3.17 series, and while running 
the latest userspace isn't as critical as the kernel for normal 
operations (online operations) since for them the kernel is the 
operational code, for fixup (offline operations like btrfs check and 
btrfs restore), you really do want to be running the latest userspace, 
because in that case it's the userspace code that's actually doing the 
work.

Meanwhile, in the other subthread you mentioned not understanding 
transid.  FWIW transaction ID and generation are used interchangeably in 
btrfs discussions and refer to the the same thing -- a monotonically 
increasing number that gets bumped every time the root tree and 
superblocks are committed.  Normally later generations/transids indicate 
later commits and thus closer a filesystem state closer to current.  Note 
that you can use btrfs-show-super to display information from the 
superblocks including what it thinks the current generation/transid 
should be.

Which brings us to the output.  In most cases when there's problems with 
the transid/generation, wanted will be a bit higher than found, something 
like found 25456, wanted 25459.  That simply means that the three latest 
commits got lost somewhere and you may have to settle for an older one 
(which is where the wiki restore article you mentioned comes in).

But there were a number of reports recently where wanted was *MUCH* 
*LOWER* than found (like wanted 5, found 2753), which is what you're 
seeing.  Unfortunately I don't remember the resolution of those reports, 
or indeed, if the bug has been traced yet.

There is another bug (or possibly the above was after this one hit if it 
didn't stop further commits in some cases, thus resetting the generation 
to zero and increasing it again from there), however, where the transid 
was being zeroed.  Wanted 26473, found 0.  One of the devs mentioned 
tracing that one, tho again I'm not sure current status except that they 
mentioned it so they're obviously working on it.

To my knowledge, these were *NOT* in the context of relabeling, however, 
so it's quite possible you're seeing the one bug, and the relabeling is 
simply coincidence.

Again, however, you're running a 3.15 kernel that's effectively btrfs 
blacklisted, and and an even older 3.14 userspace.  I can't promise 
upgrading will give you an actual fix, but certainly, getting current on 
your kernel and userspace will at least get you on the same page as most 
folks here, so we know we're not dealing with old and in the case of the 
kernel known blacklisted versions, and the bugs in play will at least be 
current ones, not long since fixed ones.  And for the kernel, avoid 3.15 
series entirely, along with early 3.16 (before 3.16.3) and 3.17 (before 
3.17.2), plus early development 3.18 (current rcs should be better).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-21  1:27 Changing label few times killed filesystem? Boris Chernov
  2014-11-21  2:20 ` Chris Murphy
@ 2014-11-21  4:35 ` Roman Mamedov
  2014-11-21  8:49   ` Boris Chernov
  2014-11-23 11:00   ` Boris Chernov
  1 sibling, 2 replies; 10+ messages in thread
From: Roman Mamedov @ 2014-11-21  4:35 UTC (permalink / raw)
  To: Boris Chernov; +Cc: linux-btrfs

On Fri, 21 Nov 2014 01:27:17 +0000
Boris Chernov <aqs1024@hotmail.com> wrote:

> 
>      I have changed file system label few times in total. When I tried 
> to mount it after that, it became not mountable:
> 
> # mount /dev/sdb1 /mnt
> mount: Not a directory

I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1.
Before mounting try things like "ls -la /mnt/", "umount /mnt", etc.
Or simply mounting somewhere else other than /mnt/

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-21  4:35 ` Roman Mamedov
@ 2014-11-21  8:49   ` Boris Chernov
  2014-11-23 11:00   ` Boris Chernov
  1 sibling, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-21  8:49 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On 2014-11-21 04:35, Roman Mamedov wrote:
> On Fri, 21 Nov 2014 01:27:17 +0000
> Boris Chernov <aqs1024@hotmail.com> wrote:
>>       I have changed file system label few times in total. When I tried
>> to mount it after that, it became not mountable:
>>
>> # mount /dev/sdb1 /mnt
>> mount: Not a directory
> I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1.
> Before mounting try things like "ls -la /mnt/", "umount /mnt", etc.
> Or simply mounting somewhere else other than /mnt/
     Before I attempted mounting to /mnt I tried to mount with KDE 
Device Notifier to /media/username/label, then I have tried to create 
directory manually in /media/ and tried to mount in the command-line, 
then tried /mnt, and error was the same. So I'm sure there is nothing 
wrong with my mount points.
     Now I have rebooted and tried to mount in KDE Device Notifier to 
/media/username/label, it failed again, so I tried from command-line as 
root:

# mkdir /media/sdb1 && ls -la /media/sdb1 && mount /dev/sdb1 /media/sdb1
total 8
drwxr-sr-x 2 root disk 4096 Nov 21 08:12 .
drwsrwsrwT 7 root disk 4096 Nov 21 08:12 ..

     ...and that's it, no output from mount command (it just hanged and 
become unkillable process). Please let me know if there is anything else 
I could try to either restore it or debug it (to at least understand why 
exactly it screwed up itself so it will not happen again to me or anyone 
else). If it matters, the disk is with single partition (BTRFS-only), 
was plugged-in all the time and I use Xeon-based workstation with ECC 
memory. In the dmesg I see the following, it seems after encountering 
btrfs bugs in its recovery tools (mentioned in my previous mail) I have 
also encountered btrfs bug in the kernel:

[  339.349260] BTRFS info (device sdb1): disk space caching is enabled
[  339.397438] parent transid verify failed on 29458432 wanted 5 found 2759
[  339.397505] ------------[ cut here ]------------
[  339.397510] kernel BUG at fs/btrfs/locking.c:269!
[  339.397513] invalid opcode: 0000 [#1] SMP
[  339.397517] Modules linked in: ppp_deflate bsd_comp ppp_async 
crc_ccitt ppp_generic slhc snd_aloop snd_hrtimer xt_conntrack 
iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 snd_i2c snd_ice17xx_ak4xxx 
snd_ak4xxx_adda snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm 
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device 
snd_timer snd soundcore ac97_bus vmnet(O) parport_pc parport 
vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats zram nvidia(PO) 
cfg80211 rfkill binfmt_misc uinput zfs(PO) zunicode(PO) zavl(PO) 
zcommon(PO) znvpair(PO) spl(O) nfsd auth_rpcgss oid_registry nfs_acl nfs 
lockd fscache sunrpc iTCO_wdt iTCO_vendor_support usblp kvm_intel kvm 
ses enclosure cdc_ether psmouse option i2c_i801 pcspkr usbnet mii 
usb_wwan usbserial serio_raw i7core_edac edac_core uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media evdev 
joydev jc42 w83627ehf lm90 coretemp adt7475 hwmon_vid adm1021 ttm 
drm_kms_helper drm i2c_algo_bit i2c_core msr loop fuse tpm_infineon 
tpm_tis lpc_ich mfd_core tpm button acpi_cpufreq processor thermal_sys 
autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq usb_storage sg sd_mod 
sr_mod cdrom crc_t10dif crct10dif_common hid_generic usbhid hid ahci 
libahci libata crc32c_intel scsi_mod e1000e ptp pps_core xhci_hcd 
ehci_pci ehci_hcd usbcore usb_common [last unloaded: vmnet]
[  339.397584] CPU: 0 PID: 25752 Comm: mount Tainted: P           O 
3.15.0-pf2 #1
[  339.397585] Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2        
08/19/11
[  339.397586] task: ffff880036c93f80 ti: ffff8805702b4000 task.ti: 
ffff8805702b4000
[  339.397587] RIP: 0010:[<ffffffffa0245050>] [<ffffffffa0245050>] 
btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs]
[  339.397604] RSP: 0018:ffff8805702b7bf0  EFLAGS: 00010246
[  339.397605] RAX: 0000000000000000 RBX: ffff8804db6da800 RCX: 
0000000000000581
[  339.397606] RDX: 0000000000000000 RSI: ffff8804db58d0e0 RDI: 
ffff8804db6da800
[  339.397607] RBP: 0000000000000001 R08: 000000000001b830 R09: 
ffff88063fc1b830
[  339.397608] R10: ffff88061afec700 R11: ffffea00136d6300 R12: 
0000000000000005
[  339.397609] R13: ffff88008c978820 R14: ffff88061af51000 R15: 
ffff8804db6da800
[  339.397610] FS:  00007f55bf45b840(0000) GS:ffff88063fc00000(0000) 
knlGS:0000000000000000
[  339.397612] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  339.397613] CR2: 00007f6b280af000 CR3: 00000004da047000 CR4: 
00000000000007f0
[  339.397614] Stack:
[  339.397614]  ffffffffa024557d ffff8804db6da800 ffffffffa0208838 
0000000000000000
[  339.397616]  0000000000000000 0000000000000000 0000000000000000 
ffff88008c978820
[  339.397617]  ffffffffa02093a0 0000000000001c18 0000000000000005 
ffff8804db6da800
[  339.397619] Call Trace:
[  339.397629]  [<ffffffffa024557d>] ? 
btrfs_tree_read_unlock_blocking+0x8d/0xc0 [btrfs]
[  339.397637]  [<ffffffffa0208838>] ? verify_parent_transid+0x118/0x1a0 
[btrfs]
[  339.397645]  [<ffffffffa02093a0>] ? 
btree_read_extent_buffer_pages.constprop.46+0xc0/0x110 [btrfs]
[  339.397653]  [<ffffffffa020a46e>] ? read_tree_block+0x2e/0x50 [btrfs]
[  339.397662]  [<ffffffffa020b90e>] ? btrfs_read_tree_root+0x10e/0x180 
[btrfs]
[  339.397670]  [<ffffffffa020e745>] ? open_ctree+0x1495/0x1e90 [btrfs]
[  339.397677]  [<ffffffffa01e791d>] ? btrfs_mount+0x6bd/0x880 [btrfs]
[  339.397683]  [<ffffffff81191f71>] ? mount_fs+0x31/0x1b0
[  339.397687]  [<ffffffff811ac63d>] ? vfs_kern_mount+0x5d/0x110
[  339.397690]  [<ffffffff811aecb5>] ? do_mount+0x225/0xa50
[  339.397693]  [<ffffffff811393b8>] ? memdup_user+0x38/0x70
[  339.397695]  [<ffffffff811af7fb>] ? SyS_mount+0x9b/0x110
[  339.397698]  [<ffffffff814de3f9>] ? system_call_fastpath+0x16/0x1b
[  339.397699] Code: ee e0 b9 ea ff ff ff e9 64 ff ff ff 4c 8b a4 24 90 
00 00 00 b9 ea ff ff ff e9 52 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 
1f 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66
[  339.397715] RIP  [<ffffffffa0245050>] 
btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs]
[  339.397722]  RSP <ffff8805702b7bf0>
[  339.397822] ---[ end trace 335f63b7cdc66864 ]---
[  341.358672] perf interrupt took too long (2508 > 2500), lowering 
kernel.perf_event_max_sample_rate to 50000

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-21  4:35 ` Roman Mamedov
  2014-11-21  8:49   ` Boris Chernov
@ 2014-11-23 11:00   ` Boris Chernov
  2014-11-24  2:46     ` Duncan
  1 sibling, 1 reply; 10+ messages in thread
From: Boris Chernov @ 2014-11-23 11:00 UTC (permalink / raw)
  To: linux-btrfs


 > I suggest upgrading and just posting the results from 'btrfs check 
<device>'
 > without any options and see what you get.
     OK, I have upgraded to 3.17.0 kernel and I also have upgraded 
btrfs-tools:
# btrfs --version
Btrfs v3.17

# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs[0x41a081]
btrfs[0x41a0a5]
btrfs[0x409783]
btrfs[0x40a45e]
btrfs[0x41bfa9]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7feaf251cb45]
btrfs[0x40b497]

     btrfsck /dev/sdb1 gives exactly the same output. It seems it does 
not even try to check anything but just fails on the assertion. I also 
tried btrfs restore:

# btrfs restore /dev/sdb1 /media/backup/sdb1 # Does nothing and exits 
almost immediately
# echo $?
0

     After I have upgraded to new kernel, when I try to mount the 
partition I get this:

# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

# dmesg | tail
...
[ 2505.921545] BTRFS info (device sdb1): disk space caching is enabled
[ 2505.925079] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.944413] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.958450] BTRFS: open_ctree failed

 > However, if you are not now and never did use compression on that 
filesystem,
 > that bug shouldn't affect you, but others might.
     I did not use compression on this partition, but I have used it on 
another btrfs disk (which seems to work fine, at least for now). I think 
I did not use any of special features on the partition I have trouble 
with (I was planning to, but it died before I got a chance).

 > it's quite possible you're seeing the one bug, and the relabeling is 
simply coincidence.
     I suppose it is possible that something else was the cause, but 
only other thing I did with the file system at the time was 
mounting/unmounting it. Also, I did not use it much, just for few weeks, 
before that the disk was unplugged for a few months (with no files on 
it). And only things I did with it (before it stopped working) was 
creating, moving, copying and deleting files.
     Before upgrading btrfs-tools and the kernel I tried to reproduce 
the issue by creating big file with btrfs file system, but I was unable 
to reproduce the problem, but I did not put as much files as on real 
partition, and it was of a smaller size. In other words, the issue I 
have encountered seems to be hard to reproduce, so I cannot tell with 
100% certainty what exactly caused the corruption.


     Is there anything else I can try? If not to restore it then to 
provide more useful debug information (if possible in this case). I 
could try compiling latest development versions of kernel and/or 
btrfs-tools if is there a chance that might help?


     P.S. I received on my mail only shortest reply about "mount" 
command, so I was able to read other replies only after few days when 
they appeared on gmane (I wasn't subscribed at the time because I did 
not expect gmane to be so slow). This time I subscribed to the list so 
hopefully I will be able to read all replies without delay.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-23 11:00   ` Boris Chernov
@ 2014-11-24  2:46     ` Duncan
  2014-11-25 11:04       ` Boris Chernov
  2014-11-25 16:46       ` Boris Chernov
  0 siblings, 2 replies; 10+ messages in thread
From: Duncan @ 2014-11-24  2:46 UTC (permalink / raw)
  To: linux-btrfs

Boris Chernov posted on Sun, 23 Nov 2014 11:00:16 +0000 as excerpted:

>  P.S. I received on my mail only shortest reply about "mount"
> command, so I was able to read other replies only after few days when
> they appeared on gmane (I wasn't subscribed at the time because I did
> not expect gmane to be so slow). This time I subscribed to the list so
> hopefully I will be able to read all replies without delay.

FWIW I use gmane's list2news service here, and didn't experience such 
delays (maybe a few hours here and there, but...).

However, if you were using gmane's web service, that explains things as 
weaverd, the process that does the threading on the web side, was down 
for some days, and Lars (gmane's owner and primary admin, there's others 
but only Lars is able to do some things) only found out about it when he 
followed up on a report from someone in gmane.discuss.  Check out that 
list/group for more.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-24  2:46     ` Duncan
@ 2014-11-25 11:04       ` Boris Chernov
  2014-11-25 16:46       ` Boris Chernov
  1 sibling, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-25 11:04 UTC (permalink / raw)
  To: linux-btrfs

On 2014-11-24 02:46, Duncan wrote
 > if you were using gmane's web service, that explains things as 
weaverd, the process
 > that does the threading on the web side, was down for some days
     Yes, I have used gmane blog. Good to know it is not down anymore.

     Back on topic. Even after updating to the latest version, btrfsck 
or any of its options including --repair still do not work. Does anyone 
know what "Assertion `rec->is_root` failed" means? Is it worth trying to 
compile my own version of btrfsck without this assertion?
     With or without --repair option, it looks like this assertion stops 
btrfsck very early, preventing btrfsck from checking the filesystem or 
attempting to repair it.

# btrfsck /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb275f24b45]
btrfs check[0x40b497]

# btrfsck --repair /dev/sdb1
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec->is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc5b8dab45]
btrfs check[0x40b497]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-24  2:46     ` Duncan
  2014-11-25 11:04       ` Boris Chernov
@ 2014-11-25 16:46       ` Boris Chernov
  2014-11-27 18:27         ` Boris Chernov
  1 sibling, 1 reply; 10+ messages in thread
From: Boris Chernov @ 2014-11-25 16:46 UTC (permalink / raw)
  To: linux-btrfs

     In attempt to get more information, I have commented out 
BUG_ON(rec->is_root) in cmds-check.c to let btrfsck check my file system 
without failing on this assertion. Below you can see the output. I would 
appreciate any help or ideas...

# btrfsck /dev/sdb1  # Full log can be downloaded here: 
http://pastebin.com/D68vr69J
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
...
ref mismatch on [20987904 16384] extent item 0, found 1
Backref 20987904 parent 3 root 3 not found in extent tree
backpointer mismatch on [20987904 16384]
owner ref check failed [20987904 16384]
...messages like these repeat many times, download full log to see them 
all...
ref mismatch on [29540352 16384] extent item 0, found 1
Backref 29540352 parent 18446744073709551607 root 18446744073709551607 
not found in extent tree
backpointer mismatch on [29540352 16384]
owner ref check failed [29540352 16384]
...
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 not found
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
  referenced 0
Btrfs v3.17

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Changing label few times killed filesystem?
  2014-11-25 16:46       ` Boris Chernov
@ 2014-11-27 18:27         ` Boris Chernov
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Chernov @ 2014-11-27 18:27 UTC (permalink / raw)
  To: linux-btrfs


     Since nobody had any other suggestions, I decided to attempt to run 
modified btrfsck with --repair option (without BUG_ON(rec->is_root) 
assertion).

     Surprisingly modified btrfsck --repair fixed all errors but one 
(according to btrfsck), but btrfsck asked me to run btrfsck --repair one 
more time to fix the remaining error. Mounting still did not work at 
this point, so I did what btrfsck suggested. At first it said it fixed 
the remaining error but then it found many more errors (not sure if 
btrfsck caused them or they were already present and fixing the 
remaining error just uncovered them).

     btrfs restore (with or with -t option) returns with zero exit code 
without even attempting to do anything (like it did before I tried to 
--repair). Mounting with or without "recovery" option produces the same 
errors (they were exactly the same before --repair so I already 
mentioned them in previous message, but for convenience I mention them 
again in the log below). "btrfs rescue chunk-recover" and "btrfs rescue 
super-recover" say that everything is OK.

     Does anybody have any ideas or suggestions?

     Please do not be afraid to suggest something risky - at this point 
I have nothing to lose, because if I cannot restore files or provide 
further debug information for developers, I have to reformat this 
partition anyway. Ideas what could have caused this corruption are also 
welcome, because currently I find it hard to believe that relabeling or 
mounting/unmounting were the only reasons.

     Below I show what I did exactly and show some parts of terminal 
output (for readability I removed repeated similar messages, please 
download full log if you are interested).

# btrfsck --repair /dev/sdb1  # Full log is can be downloaded here: 
http://pastebin.com/MdyjxY4w
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
ref mismatch on [20971520 16384] extent item 0, found 1
adding new tree backref on start 20971520 len 16384 parent 3 root 3
Backref 20971520 parent 3 root 3 not found in extent tree
backpointer mismatch on [20971520 16384]
...
owner ref check failed [47529984 16384]
repaired damaged extent references
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 error
...
root 5 inode 5 errors 1, no inode item
         unresolved ref dir 6 index 0 namelen 7 name default filetype 0 
errors 3, no dir item, no dir index
Failed to find [30769152, 168, 16384]
btrfs unable to find ref byte nr 30769152 parent 0 root 5  owner 0 offset 1
reset isize for dir 6 root 5
root 5 inode 6 errors 2000, link count wrong
         unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 
errors 3, no dir item, no dir index
root 5 inode 7 errors 1, no inode item
root 5 inode 9 errors 1, no inode item
root 5 inode 257 errors 2400, nbytes wrong, link count wrong
...
root 5 inode 18446744073709551607 errors 1, no inode item
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
  referenced 0
Btrfs v3.17


     To my surprise, btrfsck showed great improvements (after btrfsck 
--repair) and asked me to run btrfsck --repair one more time to fix 
remaining error:


# btrfsck /dev/sdb1
root item for root 18446744073709551607, current bytenr 29540352, 
current gen 2758, current level 0, new bytenr 29540352, new gen 
4294967296, new level 1
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.


     Before trying to run btrfsck --repair again, I tried to mount, but 
it did not work:


# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so
# dmesg | tail
...
[268827.386951] BTRFS info (device sdb1): disk space caching is enabled
[268827.389932] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.390161] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.405135] BTRFS: open_ctree failed


     Since btrfsck told me to run it with --repair option again, I did:


# btrfsck --repair /dev/sdb1  # Full log is available here: 
http://pastebin.com/pcWte3Ru
enabling repair mode
fixing root item for root 18446744073709551607, current bytenr 29540352, 
current gen 2758, current level 0, new bytenr 29540352, new gen 
4294967296, new level 1
Fixed 1 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
parent transid verify failed on 29425664 wanted 1087 found 2763
...
Ignoring transid failure
leaf parent key incorrect 29425664
bad block 29425664
Chunk[256, 228, 0]: length(4194304), offset(0), type(2) is not found in 
block group
Chunk[256, 228, 0] stripe[1, 0] is not found in dev extent
...
Dev extent's total-byte(0) is not equal to byte-used(500107771904) in 
dev[1, 216, 1]
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 error
...
root 5 inode 5 errors 1, no inode item
         unresolved ref dir 6 index 0 namelen 7 name default filetype 0 
errors 3, no dir item, no dir index
root 5 inode 6 errors 2000, link count wrong
         unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 
errors 3, no dir item, no dir index
root 5 inode 7 errors 1, no inode item
root 5 inode 9 errors 1, no inode item
root 5 inode 257 errors 2400, nbytes wrong, link count wrong
...
root 5 inode 18446744073709551607 errors 1, no inode item
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
parent transid verify failed on 29540352 wanted 4294967296 found 2758
Ignoring transid failure
found 453869568 bytes used err is 1
total csum bytes: 0
total tree bytes: 1785856
total fs tree bytes: 16384
total extent tree bytes: 16384
btree space waste bytes: 809878
file data blocks allocated: 0
  referenced 0
Btrfs v3.17


     If I try to mount it again, error in dmesg remains the same as 
before and btrfsck shows that errors which appeared after second 
--repair are still present (they can be seen in the log above). I also 
tried "btrfs rescue" but this did not make any difference (still can't 
use "btrfs restore" or mount):


# btrfs rescue super-recover /dev/sdb1
All supers are valid, no need to recover

# btrfs rescue chunk-recover /dev/sdb1 -v  # Full log is available here: 
http://pastebin.com/7knR1afA
All Devices:
         Device: id = 1, name = /dev/sdb1

DEVICE SCAN RESULT:
Filesystem Information:
         sectorsize: 4096
         leafsize: 16384
         tree root generation: 2765
         chunk root generation: 952
...
     Bad Chunks:

     Total Chunks:   469
       Heathy:       469
       Bad:  0

     Orphan Block Groups:

     Orphan Device Extents:
     Check chunks successfully with no orphans
     Recover the chunk tree successfully.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-27 18:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-21  1:27 Changing label few times killed filesystem? Boris Chernov
2014-11-21  2:20 ` Chris Murphy
2014-11-21 11:47   ` Duncan
2014-11-21  4:35 ` Roman Mamedov
2014-11-21  8:49   ` Boris Chernov
2014-11-23 11:00   ` Boris Chernov
2014-11-24  2:46     ` Duncan
2014-11-25 11:04       ` Boris Chernov
2014-11-25 16:46       ` Boris Chernov
2014-11-27 18:27         ` Boris Chernov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).