problem with long running btrfs mount operation

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* problem with long running btrfs mount operation
@ 2015-09-22 13:38 S. Fricke
  2015-09-22 13:53 ` Holger Hoffstätte
  0 siblings, 1 reply; 8+ messages in thread
From: S. Fricke @ 2015-09-22 13:38 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a problem with one of my btrfs hdds. If I mount it, it needs more than
135 minutes for this operation. After the mounting it works normaly. This is
reproducible only with this hdd.

Maybe someone has a clue what is going wrong here.


Silvio

% uname -a
Linux develbox 4.1.6-1-ARCH #1 SMP PREEMPT Mon Aug 17 08:52:28 CEST 2015 x86_64 GNU/Linux
% btrfs --version
btrfs-progs v4.2
% sudo btrfs fi show
Label: none  uuid: 2299f474-aae8-4d43-909c-d69f724ea65d
        Total devices 1 FS bytes used 203.54GiB
        devid    1 size 911.51GiB used 215.04GiB path /dev/sdb2

Label: none  uuid: 4db27f9b-d8fe-4341-985a-4ce55ea9fd25
        Total devices 1 FS bytes used 668.39GiB
        devid    1 size 867.11GiB used 710.06GiB path /dev/sda1
% sudo btrfs fi df /storage
Data, single: total=632.00GiB, used=631.98GiB
System, DUP: total=32.00MiB, used=96.00KiB
Metadata, DUP: total=39.00GiB, used=36.41GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

And the related parts to btrfs from dmesg:

[Sep22 12:48] INFO: task btrfs-transacti:3280 blocked for more than 120 seconds.
[  +0.000096]       Tainted: P           O    4.1.6-1-ARCH #1
[  +0.000085] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000117] btrfs-transacti D ffff880ffe55fd68     0  3280      2 0x00000000
[  +0.000004]  ffff880ffe55fd68 ffff881008068000 ffff881006ad8a30 ffff880ffe55fdd8
[  +0.000002]  ffff880ffe560000 ffff880fff1811f0 ffff880fff1811f0 ffff880febdb60b0
[  +0.000002]  0000000000000000 ffff880ffe55fd88 ffffffff81588377 ffff880fff1811f0
[  +0.000002] Call Trace:
[  +0.000008]  [<ffffffff81588377>] schedule+0x37/0x90
[  +0.000013]  [<ffffffffa0f088ef>] wait_current_trans.isra.9+0xcf/0x120 [btrfs]
[  +0.000005]  [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60
[  +0.000007]  [<ffffffffa0f0a598>] start_transaction+0x3e8/0x5b0 [btrfs]
[  +0.000006]  [<ffffffffa0f0a817>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[  +0.000006]  [<ffffffffa0f04f8e>] transaction_kthread+0x18e/0x240 [btrfs]
[  +0.000006]  [<ffffffffa0f04e00>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs]
[  +0.000004]  [<ffffffff81097868>] kthread+0xd8/0xf0
[  +0.000003]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[  +0.000003]  [<ffffffff8158c3a2>] ret_from_fork+0x42/0x70
[  +0.000002]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[Sep22 12:50] INFO: task btrfs-transacti:3280 blocked for more than 120 seconds.
[  +0.000096]       Tainted: P           O    4.1.6-1-ARCH #1
[  +0.000085] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000117] btrfs-transacti D ffff880ffe55fd68     0  3280      2 0x00000000
[  +0.000004]  ffff880ffe55fd68 ffff881008068000 ffff881006ad8a30 ffff880ffe55fdd8
[  +0.000002]  ffff880ffe560000 ffff880fff1811f0 ffff880fff1811f0 ffff880febdb60b0
[  +0.000002]  0000000000000000 ffff880ffe55fd88 ffffffff81588377 ffff880fff1811f0
[  +0.000002] Call Trace:
[  +0.000008]  [<ffffffff81588377>] schedule+0x37/0x90
[  +0.000014]  [<ffffffffa0f088ef>] wait_current_trans.isra.9+0xcf/0x120 [btrfs]
[  +0.000005]  [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60
[  +0.000007]  [<ffffffffa0f0a598>] start_transaction+0x3e8/0x5b0 [btrfs]
[  +0.000006]  [<ffffffffa0f0a817>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[  +0.000006]  [<ffffffffa0f04f8e>] transaction_kthread+0x18e/0x240 [btrfs]
[  +0.000006]  [<ffffffffa0f04e00>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs]
[  +0.000003]  [<ffffffff81097868>] kthread+0xd8/0xf0
[  +0.000002]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[  +0.000003]  [<ffffffff8158c3a2>] ret_from_fork+0x42/0x70
[  +0.000002]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[Sep22 13:07] INFO: task mount:3257 blocked for more than 120 seconds.
[  +0.000092]       Tainted: P           O    4.1.6-1-ARCH #1
[  +0.000086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000116] mount           D ffff880fff70f958     0  3257   3256 0x00000000
[  +0.000004]  ffff880fff70f958 ffff88100879bd20 ffff880fff0a0a30 0000000000000001
[  +0.000003]  ffff880fff710000 ffff880fff1811f0 ffff880fff1811f0 ffff880febd524d0
[  +0.000002]  0000000000000001 ffff880fff70f978 ffffffff81588377 ffff880fff1811f0
[  +0.000002] Call Trace:
[  +0.000008]  [<ffffffff81588377>] schedule+0x37/0x90
[  +0.000014]  [<ffffffffa0f088ef>] wait_current_trans.isra.9+0xcf/0x120 [btrfs]
[  +0.000004]  [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60
[  +0.000007]  [<ffffffffa0f0a5eb>] start_transaction+0x43b/0x5b0 [btrfs]
[  +0.000007]  [<ffffffffa0efa910>] ? btrfs_update_root+0xf0/0x290 [btrfs]
[  +0.000007]  [<ffffffffa0f0a77b>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[  +0.000006]  [<ffffffffa0ef62fb>] btrfs_drop_snapshot+0x57b/0x8a0 [btrfs]
[  +0.000008]  [<ffffffffa0f610b9>] merge_reloc_roots+0xe9/0x280 [btrfs]
[  +0.000007]  [<ffffffffa0f62002>] btrfs_recover_relocation+0x3b2/0x460 [btrfs]
[  +0.000006]  [<ffffffffa0f06e45>] open_ctree+0x1ab5/0x21c0 [btrfs]
[  +0.000005]  [<ffffffffa0eda8bf>] btrfs_mount+0x83f/0x910 [btrfs]
[  +0.000004]  [<ffffffff812de5e5>] ? find_next_bit+0x15/0x30
[  +0.000002]  [<ffffffff81188a7a>] ? pcpu_alloc+0x39a/0x6a0
[  +0.000003]  [<ffffffff811e5078>] mount_fs+0x38/0x190
[  +0.000002]  [<ffffffff81188db5>] ? __alloc_percpu+0x15/0x20
[  +0.000002]  [<ffffffff8120139b>] vfs_kern_mount+0x6b/0x120
[  +0.000003]  [<ffffffff81203c9a>] do_mount+0x24a/0xd40
[  +0.000002]  [<ffffffff81204ae3>] SyS_mount+0xa3/0x110
[  +0.000003]  [<ffffffff8158bfae>] system_call_fastpath+0x12/0x71
[Sep22 13:09] INFO: task mount:3257 blocked for more than 120 seconds.
[  +0.000093]       Tainted: P           O    4.1.6-1-ARCH #1
[  +0.000086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000116] mount           D ffff880fff70f958     0  3257   3256 0x00000000
[  +0.000004]  ffff880fff70f958 ffff88100879bd20 ffff880fff0a0a30 0000000000000001
[  +0.000002]  ffff880fff710000 ffff880fff1811f0 ffff880fff1811f0 ffff880febd524d0
[  +0.000002]  0000000000000001 ffff880fff70f978 ffffffff81588377 ffff880fff1811f0
[  +0.000002] Call Trace:
[  +0.000008]  [<ffffffff81588377>] schedule+0x37/0x90
[  +0.000014]  [<ffffffffa0f088ef>] wait_current_trans.isra.9+0xcf/0x120 [btrfs]
[  +0.000005]  [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60
[  +0.000007]  [<ffffffffa0f0a5eb>] start_transaction+0x43b/0x5b0 [btrfs]
[  +0.000007]  [<ffffffffa0efa910>] ? btrfs_update_root+0xf0/0x290 [btrfs]
[  +0.000006]  [<ffffffffa0f0a77b>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[  +0.000007]  [<ffffffffa0ef62fb>] btrfs_drop_snapshot+0x57b/0x8a0 [btrfs]
[  +0.000007]  [<ffffffffa0f610b9>] merge_reloc_roots+0xe9/0x280 [btrfs]
[  +0.000007]  [<ffffffffa0f62002>] btrfs_recover_relocation+0x3b2/0x460 [btrfs]
[  +0.000007]  [<ffffffffa0f06e45>] open_ctree+0x1ab5/0x21c0 [btrfs]
[  +0.000005]  [<ffffffffa0eda8bf>] btrfs_mount+0x83f/0x910 [btrfs]
[  +0.000003]  [<ffffffff812de5e5>] ? find_next_bit+0x15/0x30
[  +0.000002]  [<ffffffff81188a7a>] ? pcpu_alloc+0x39a/0x6a0
[  +0.000004]  [<ffffffff811e5078>] mount_fs+0x38/0x190
[  +0.000001]  [<ffffffff81188db5>] ? __alloc_percpu+0x15/0x20
[  +0.000003]  [<ffffffff8120139b>] vfs_kern_mount+0x6b/0x120
[  +0.000002]  [<ffffffff81203c9a>] do_mount+0x24a/0xd40
[  +0.000003]  [<ffffffff81204ae3>] SyS_mount+0xa3/0x110
[  +0.000003]  [<ffffffff8158bfae>] system_call_fastpath+0x12/0x71
[Sep22 14:53] ------------[ cut here ]------------
[  +0.000018] WARNING: CPU: 1 PID: 3257 at fs/btrfs/extent-tree.c:6283 __btrfs_free_extent+0x9a1/0xc80 [btrfs]()
[  +0.000001] Modules linked in: nls_utf8 ntfs uas usb_storage pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) fuse bridge stp llc msr ax88179_178a usbnet btrfs mii joydev mousedev xor raid6_pq intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_hdmi kvm_i
[  +0.000044]  oid_registry nfs_acl lockd grace sunrpc fscache ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod hid_generic usbhid hid atkbd libps2 isci ahci libsas libahci scsi_transport_sas xhci_pci xhci_hcd ehci_pci ehci_hcd libata usbcore usb_common scsi_mod i8042 serio button
[  +0.000020] CPU: 1 PID: 3257 Comm: mount Tainted: P           O    4.1.6-1-ARCH #1
[  +0.000002] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[  +0.000001]  0000000000000000 00000000015cd577 ffff880fff70f708 ffffffff815866ee
[  +0.000003]  0000000000000000 0000000000000000 ffff880fff70f748 ffffffff81078cba
[  +0.000001]  0000000000001000 000001bc7a3b9000 ffff880ffe4ee000 ffff880f0738ab40
[  +0.000003] Call Trace:
[  +0.000006]  [<ffffffff815866ee>] dump_stack+0x4c/0x6e
[  +0.000004]  [<ffffffff81078cba>] warn_slowpath_common+0x8a/0xc0
[  +0.000001]  [<ffffffff81078dea>] warn_slowpath_null+0x1a/0x20
[  +0.000006]  [<ffffffffa0eee081>] __btrfs_free_extent+0x9a1/0xc80 [btrfs]
[  +0.000007]  [<ffffffffa0ef2f17>] __btrfs_run_delayed_refs+0xa17/0x1310 [btrfs]
[  +0.000004]  [<ffffffff812f358d>] ? __percpu_counter_add+0x5d/0xa0
[  +0.000005]  [<ffffffffa0eeb17a>] ? add_pinned_bytes+0x4a/0x60 [btrfs]
[  +0.000006]  [<ffffffffa0ef3fb7>] ? walk_up_proc+0xd7/0x500 [btrfs]
[  +0.000007]  [<ffffffffa0ef7c53>] btrfs_run_delayed_refs.part.36+0x73/0x270 [btrfs]
[  +0.000006]  [<ffffffffa0ef7e65>] btrfs_run_delayed_refs+0x15/0x30 [btrfs]
[  +0.000007]  [<ffffffffa0f08eea>] btrfs_should_end_transaction+0x5a/0x60 [btrfs]
[  +0.000007]  [<ffffffffa0ef61d5>] btrfs_drop_snapshot+0x455/0x8a0 [btrfs]
[  +0.000007]  [<ffffffffa0f610b9>] merge_reloc_roots+0xe9/0x280 [btrfs]
[  +0.000007]  [<ffffffffa0f62002>] btrfs_recover_relocation+0x3b2/0x460 [btrfs]
[  +0.000007]  [<ffffffffa0f06e45>] open_ctree+0x1ab5/0x21c0 [btrfs]
[  +0.000005]  [<ffffffffa0eda8bf>] btrfs_mount+0x83f/0x910 [btrfs]
[  +0.000003]  [<ffffffff812de5e5>] ? find_next_bit+0x15/0x30
[  +0.000002]  [<ffffffff81188a7a>] ? pcpu_alloc+0x39a/0x6a0
[  +0.000003]  [<ffffffff811e5078>] mount_fs+0x38/0x190
[  +0.000002]  [<ffffffff81188db5>] ? __alloc_percpu+0x15/0x20
[  +0.000003]  [<ffffffff8120139b>] vfs_kern_mount+0x6b/0x120
[  +0.000002]  [<ffffffff81203c9a>] do_mount+0x24a/0xd40
[  +0.000002]  [<ffffffff81204ae3>] SyS_mount+0xa3/0x110
[  +0.000003]  [<ffffffff8158bfae>] system_call_fastpath+0x12/0x71
[  +0.000002] ---[ end trace 8284bbbaadd0330d ]---
[Sep22 14:55] BTRFS: checking UUID tree
[  +0.000076] BTRFS info (device sda1): continuing balance
[  +0.224293] BTRFS info (device sda1): relocating block group 1910283173888 flags 36
[Sep22 14:58] ------------[ cut here ]------------
[  +0.000020] WARNING: CPU: 1 PID: 7584 at fs/btrfs/extent-tree.c:6283 __btrfs_free_extent+0x9a1/0xc80 [btrfs]()
[  +0.000002] Modules linked in: nls_utf8 ntfs uas usb_storage pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) fuse bridge stp llc msr ax88179_178a usbnet btrfs mii joydev mousedev xor raid6_pq intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_hdmi kvm_i
[  +0.000043]  oid_registry nfs_acl lockd grace sunrpc fscache ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod hid_generic usbhid hid atkbd libps2 isci ahci libsas libahci scsi_transport_sas xhci_pci xhci_hcd ehci_pci ehci_hcd libata usbcore usb_common scsi_mod i8042 serio button
[  +0.000020] CPU: 1 PID: 7584 Comm: btrfs-balance Tainted: P        W  O    4.1.6-1-ARCH #1
[  +0.000002] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[  +0.000002]  0000000000000000 000000009d52dc0b ffff880eb6b5f808 ffffffff815866ee
[  +0.000002]  0000000000000000 0000000000000000 ffff880eb6b5f848 ffffffff81078cba
[  +0.000002]  0000000000001000 000001bc7a3b9000 ffff880ffe4ee000 ffff88009ffbb3f0
[  +0.000003] Call Trace:
[  +0.000006]  [<ffffffff815866ee>] dump_stack+0x4c/0x6e
[  +0.000003]  [<ffffffff81078cba>] warn_slowpath_common+0x8a/0xc0
[  +0.000002]  [<ffffffff81078dea>] warn_slowpath_null+0x1a/0x20
[  +0.000006]  [<ffffffffa0eee081>] __btrfs_free_extent+0x9a1/0xc80 [btrfs]
[  +0.000007]  [<ffffffffa0ef2f17>] __btrfs_run_delayed_refs+0xa17/0x1310 [btrfs]
[  +0.000006]  [<ffffffffa0ef3fb7>] ? walk_up_proc+0xd7/0x500 [btrfs]
[  +0.000007]  [<ffffffffa0ef7c53>] btrfs_run_delayed_refs.part.36+0x73/0x270 [btrfs]
[  +0.000006]  [<ffffffffa0ef7e65>] btrfs_run_delayed_refs+0x15/0x30 [btrfs]
[  +0.000007]  [<ffffffffa0f08eea>] btrfs_should_end_transaction+0x5a/0x60 [btrfs]
[  +0.000007]  [<ffffffffa0ef61d5>] btrfs_drop_snapshot+0x455/0x8a0 [btrfs]
[  +0.000008]  [<ffffffffa0f610b9>] merge_reloc_roots+0xe9/0x280 [btrfs]
[  +0.000006]  [<ffffffffa0f614be>] relocate_block_group+0x26e/0x720 [btrfs]
[  +0.000007]  [<ffffffffa0f61b46>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[  +0.000007]  [<ffffffffa0f33a9e>] btrfs_relocate_chunk.isra.20+0x3e/0xc0 [btrfs]
[  +0.000007]  [<ffffffffa0f352e4>] btrfs_balance+0xa04/0xf90 [btrfs]
[  +0.000007]  [<ffffffffa0f358cd>] balance_kthread+0x5d/0x80 [btrfs]
[  +0.000006]  [<ffffffffa0f35870>] ? btrfs_balance+0xf90/0xf90 [btrfs]
[  +0.000004]  [<ffffffff81097868>] kthread+0xd8/0xf0
[  +0.000002]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[  +0.000004]  [<ffffffff8158c3a2>] ret_from_fork+0x42/0x70
[  +0.000002]  [<ffffffff81097790>] ? kthread_worker_fn+0x170/0x170
[  +0.000001] ---[ end trace 8284bbbaadd0330e ]---
[Sep22 15:03] BTRFS info (device sda1): found 11029 extents



-- 
-- S. Fricke ---------------------------------------- silvio@port1024.net --
   Diplom-Informatiker (FH)
   Linux-Entwicklung             JABBER: silvio@conversation.port1024.net   
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 13:38 problem with long running btrfs mount operation S. Fricke
@ 2015-09-22 13:53 ` Holger Hoffstätte
  2015-09-22 14:31   ` Richard Michael
  0 siblings, 1 reply; 8+ messages in thread
From: Holger Hoffstätte @ 2015-09-22 13:53 UTC (permalink / raw)
  To: S. Fricke, linux-btrfs

On 09/22/15 15:38, S. Fricke wrote:
> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
> 135 minutes for this operation. After the mounting it works normaly. This is
> reproducible only with this hdd.
> 
> Maybe someone has a clue what is going wrong here.

On remount it tries to continue a previously started balance, which fails with
an error (as can be seen at the end of your log). You should:

a) immediately stop what you're doing on that fs and unmount it
b) get btrfs-progs-4.2.1 (not 4.2) and see what it says

Depending on the outcome of b) you can use -o skip_balance on the next mount.

-h


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 13:53 ` Holger Hoffstätte
@ 2015-09-22 14:31   ` Richard Michael
  2015-09-22 14:43     ` S. Fricke
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Michael @ 2015-09-22 14:31 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: S. Fricke, Linux Btrfs

On Tue, Sep 22, 2015 at 9:53 AM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
> On 09/22/15 15:38, S. Fricke wrote:
>> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
>> 135 minutes for this operation. After the mounting it works normaly. This is
>> reproducible only with this hdd.
>>
>> Maybe someone has a clue what is going wrong here.
>
> On remount it tries to continue a previously started balance, which fails with
> an error (as can be seen at the end of your log). You should:
>
> a) immediately stop what you're doing on that fs and unmount it
> b) get btrfs-progs-4.2.1 (not 4.2) and see what it says
>
> Depending on the outcome of b) you can use -o skip_balance on the next mount.
>
> -h

I'm curious, do the relevant INFO lines also appear earlier in the
log, near the time the mount began?

These:

BTRFS: checking UUID tree
[  +0.000076] BTRFS info (device sda1): continuing balance


Otherwise, it requires an unknown amount of patience to diagnose this
problem -- how long should I wait before giving up?

It seems to me, from a sysadmin's perspective, the log messages are
reversed (hung task, 2+ hrs later, BTRFS tells me what it's doing) .
I'd expect BTRFS to log a "continuing balance" message as soon as I
mount it, then it's blocked [for as long as I care to wait], but I can
immediately tell what's happened.

Regards,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 14:31   ` Richard Michael
@ 2015-09-22 14:43     ` S. Fricke
  2015-09-22 14:57       ` Holger Hoffstätte
  0 siblings, 1 reply; 8+ messages in thread
From: S. Fricke @ 2015-09-22 14:43 UTC (permalink / raw)
  To: Richard Michael; +Cc: Holger Hoffstätte, Linux Btrfs

Hi,

> On Tue, Sep 22, 2015 at 9:53 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
> > On 09/22/15 15:38, S. Fricke wrote:
> >> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
> >> 135 minutes for this operation. After the mounting it works normaly. This is
> >> reproducible only with this hdd.
> >>
> >> Maybe someone has a clue what is going wrong here.
> >
> > On remount it tries to continue a previously started balance, which fails with
> > an error (as can be seen at the end of your log). You should:
> >
> > a) immediately stop what you're doing on that fs and unmount it
> > b) get btrfs-progs-4.2.1 (not 4.2) and see what it says

What should it say to me? I have to send a command? Should I try to mount the
device after I have unmount it? BTW: Is btrfs-progs needed for mounting?

> >
> > Depending on the outcome of b) you can use -o skip_balance on the next mount.
> >
> > -h
> 
> I'm curious, do the relevant INFO lines also appear earlier in the
> log, near the time the mount began?
> 
> These:
> 
> BTRFS: checking UUID tree
> [  +0.000076] BTRFS info (device sda1): continuing balance
> 
> 
> Otherwise, it requires an unknown amount of patience to diagnose this
> problem -- how long should I wait before giving up?
> 
> It seems to me, from a sysadmin's perspective, the log messages are
> reversed (hung task, 2+ hrs later, BTRFS tells me what it's doing) .
> I'd expect BTRFS to log a "continuing balance" message as soon as I
> mount it, then it's blocked [for as long as I care to wait], but I can
> immediately tell what's happened.

Thats my problem in general with btrfs, and other filesystems, I have a
glitch; I have to wait, but I have no possibilities to look what is going on
on drive X. Okay I could use sysemtaps, has someone good scripts for such
usecase? Or maybe we have other toolings?

Regards,
Silvio

-- 
-- S. Fricke ---------------------------------------- silvio@port1024.net --
   Diplom-Informatiker (FH)
   Linux-Entwicklung             JABBER: silvio@conversation.port1024.net   
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 14:43     ` S. Fricke
@ 2015-09-22 14:57       ` Holger Hoffstätte
  2015-09-22 15:13         ` Hugo Mills
  2015-09-23  5:42         ` S. Fricke
  0 siblings, 2 replies; 8+ messages in thread
From: Holger Hoffstätte @ 2015-09-22 14:57 UTC (permalink / raw)
  Cc: Linux Btrfs

On 09/22/15 16:43, S. Fricke wrote:
>> On Tue, Sep 22, 2015 at 9:53 AM, Holger Hoffstätte
>> <holger.hoffstaette@googlemail.com> wrote:
>>> On 09/22/15 15:38, S. Fricke wrote:
>>>> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
>>>> 135 minutes for this operation. After the mounting it works normaly. This is
>>>> reproducible only with this hdd.
>>>>
>>>> Maybe someone has a clue what is going wrong here.
>>>
>>> On remount it tries to continue a previously started balance, which fails with
>>> an error (as can be seen at the end of your log). You should:
>>>
>>> a) immediately stop what you're doing on that fs and unmount it
>>> b) get btrfs-progs-4.2.1 (not 4.2) and see what it says
> 
> What should it say to me? I have to send a command? Should I try to mount the
> device after I have unmount it? BTW: Is btrfs-progs needed for mounting?

I should have been more clear and say "run btrfs check", which I originally
intended to write but then somehow didn't. Sorry. :-)

btrfs-progs is not needed for mounting, only for administrative commands.

-h


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 14:57       ` Holger Hoffstätte
@ 2015-09-22 15:13         ` Hugo Mills
  2015-09-23  5:42         ` S. Fricke
  1 sibling, 0 replies; 8+ messages in thread
From: Hugo Mills @ 2015-09-22 15:13 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Linux Btrfs

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

On Tue, Sep 22, 2015 at 04:57:35PM +0200, Holger Hoffstätte wrote:
> On 09/22/15 16:43, S. Fricke wrote:
> >> On Tue, Sep 22, 2015 at 9:53 AM, Holger Hoffstätte
> >> <holger.hoffstaette@googlemail.com> wrote:
> >>> On 09/22/15 15:38, S. Fricke wrote:
> >>>> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
> >>>> 135 minutes for this operation. After the mounting it works normaly. This is
> >>>> reproducible only with this hdd.
> >>>>
> >>>> Maybe someone has a clue what is going wrong here.
> >>>
> >>> On remount it tries to continue a previously started balance, which fails with
> >>> an error (as can be seen at the end of your log). You should:
> >>>
> >>> a) immediately stop what you're doing on that fs and unmount it
> >>> b) get btrfs-progs-4.2.1 (not 4.2) and see what it says
> > 
> > What should it say to me? I have to send a command? Should I try to mount the
> > device after I have unmount it? BTW: Is btrfs-progs needed for mounting?
> 
> I should have been more clear and say "run btrfs check", which I originally
> intended to write but then somehow didn't. Sorry. :-)
> 
> btrfs-progs is not needed for mounting, only for administrative commands.

   Well, technically, it's useful (but not absolutely essential) to
have btrfs dev scan for setting up multi-device filesystems before
mount.

   Hugo.

-- 
Hugo Mills             | What's a Nazgûl like you doing in a place like this?
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                                Illiad

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-22 14:57       ` Holger Hoffstätte
  2015-09-22 15:13         ` Hugo Mills
@ 2015-09-23  5:42         ` S. Fricke
  2015-09-23  9:36           ` Holger Hoffstätte
  1 sibling, 1 reply; 8+ messages in thread
From: S. Fricke @ 2015-09-23  5:42 UTC (permalink / raw)
  To: Richard Michael; +Cc: Holger Hoffstätte, Linux Btrfs

Hi,

>>> On Tue, Sep 22, 2015 at 9:53 AM, Holger Hoffstätte <holger.hoffstaette@googlemail.com> wrote:
>>>> On 09/22/15 15:38, S. Fricke wrote:
>>>>> I have a problem with one of my btrfs hdds. If I mount it, it needs more than
>>>>> 135 minutes for this operation. After the mounting it works normaly. This is
>>>>> reproducible only with this hdd.
>>>>>
>>>>> Maybe someone has a clue what is going wrong here.
>>>>
>>>> On remount it tries to continue a previously started balance, which fails with
>>>> an error (as can be seen at the end of your log). You should:
>>>>
>>>> a) immediately stop what you're doing on that fs and unmount it
>>>> b) get btrfs-progs-4.2.1 (not 4.2) and see what it says
>> 
>> What should it say to me? I have to send a command? Should I try to mount the
>> device after I have unmount it? BTW: Is btrfs-progs needed for mounting?
> 
> I should have been more clear and say "run btrfs check", which I originally
> intended to write but then somehow didn't. Sorry. :-)

the 'btrfs check' has took some time. Here is the printout. Some advises for me?

Best Regards,
Silvio

% sudo btrfs check /dev/sda1
Checking filesystem on /dev/sda1
UUID: 4db27f9b-d8fe-4341-985a-4ce55ea9fd25
checking extents
bad metadata [1634295414784, 1634295418880) crossing stripe boundary
bad metadata [1634394767360, 1634394771456) crossing stripe boundary
bad metadata [1634691842048, 1634691846144) crossing stripe boundary
bad metadata [1634770485248, 1634770489344) crossing stripe boundary
bad metadata [1634798141440, 1634798145536) crossing stripe boundary
[... many lines cutted ...]
bad metadata [1908346978304, 1908346982400) crossing stripe boundary
bad metadata [1908597653504, 1908597657600) crossing stripe boundary
bad metadata [1908663386112, 1908663390208) crossing stripe boundary
bad metadata [1908664041472, 1908664045568) crossing stripe boundary
bad metadata [1908832010240, 1908832014336) crossing stripe boundary
bad metadata [1908848984064, 1908848988160) crossing stripe boundary
ref mismatch on [1908883419136 4096] extent item 1, found 0
Backref 1908883419136 parent 1890809671680 not referenced back 0x4fadc740
Incorrect global backref count on 1908883419136 found 1 wanted 0
backpointer mismatch on [1908883419136 4096]
owner ref check failed [1908883419136 4096]
bad metadata [1908961640448, 1908961644544) crossing stripe boundary
bad metadata [1909004566528, 1909004570624) crossing stripe boundary
bad metadata [1909015052288, 1909015056384) crossing stripe boundary
bad metadata [1909015183360, 1909015187456) crossing stripe boundary
Backref 1909016203264 parent 1890809671680 not referenced back 0x4fc37690
Backref 1909016203264 parent 5105 root 5105 not found in extent tree
Incorrect global backref count on 1909016203264 found 3 wanted 2
backpointer mismatch on [1909016203264 4096]
bad metadata [1909017804800, 1909017808896) crossing stripe boundary
bad metadata [1909144289280, 1909144293376) crossing stripe boundary
bad metadata [1909164539904, 1909164544000) crossing stripe boundary
bad metadata [1909445033984, 1909445038080) crossing stripe boundary
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 716978874897 bytes used err is 0
total csum bytes: 662660776
total tree bytes: 39097315328
total fs tree bytes: 36827172864
total extent tree bytes: 1529536512
btree space waste bytes: 11582963983
file data blocks allocated: 1258206822400
 referenced 1158920425472
btrfs-progs v4.2.1

-- 
-- S. Fricke ---------------------------------------- silvio@port1024.net --
   Diplom-Informatiker (FH)
   Linux-Entwicklung             JABBER: silvio@conversation.port1024.net   
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem with long running btrfs mount operation
  2015-09-23  5:42         ` S. Fricke
@ 2015-09-23  9:36           ` Holger Hoffstätte
  0 siblings, 0 replies; 8+ messages in thread
From: Holger Hoffstätte @ 2015-09-23  9:36 UTC (permalink / raw)
  To: Linux Btrfs, S. Fricke

On 09/23/15 07:42, S. Fricke wrote:
> 
> the 'btrfs check' has took some time. Here is the printout. Some advises for me?
> 
> 
> Best Regards,
> Silvio
> 
> % sudo btrfs check /dev/sda1
> Checking filesystem on /dev/sda1
> UUID: 4db27f9b-d8fe-4341-985a-4ce55ea9fd25
> checking extents
> bad metadata [1634295414784, 1634295418880) crossing stripe boundary
> bad metadata [1634394767360, 1634394771456) crossing stripe boundary
> bad metadata [1634691842048, 1634691846144) crossing stripe boundary
> bad metadata [1634770485248, 1634770489344) crossing stripe boundary
> bad metadata [1634798141440, 1634798145536) crossing stripe boundary
> [... many lines cutted ...]

Btw these messages are the reasons I wanted you to run progs-4.2.1,
since 4.2(.0) would print them even if the problem wasn't there.
But it looks like you really do have the problem. This is a converted fs,
right?

Unfortunately this problem is currently only detected, but not yet
really fixable (see [1] for details), so I don't think running check
with --repair is going to help. However it might bring the rest of the
fs back into a workable state (remember to use -o skip_balance!) so that
you can backup whatever needs rescuing.

Last resort would be to mount -ro, which will prevent the fs from COWing
itself deeper into that particular hole.

FWIW this particular bug with convert creating borked filesystems should
be fixed now, so just try again. :)

Maybe someone else has a better suggestion for recovery.

-h

[1] https://github.com/kdave/btrfs-progs/commit/595c57d2f4dd3199aacb23b4c68d6aff49f97d29

> bad metadata [1908346978304, 1908346982400) crossing stripe boundary
> bad metadata [1908597653504, 1908597657600) crossing stripe boundary
> bad metadata [1908663386112, 1908663390208) crossing stripe boundary
> bad metadata [1908664041472, 1908664045568) crossing stripe boundary
> bad metadata [1908832010240, 1908832014336) crossing stripe boundary
> bad metadata [1908848984064, 1908848988160) crossing stripe boundary
> ref mismatch on [1908883419136 4096] extent item 1, found 0
> Backref 1908883419136 parent 1890809671680 not referenced back 0x4fadc740
> Incorrect global backref count on 1908883419136 found 1 wanted 0
> backpointer mismatch on [1908883419136 4096]
> owner ref check failed [1908883419136 4096]
> bad metadata [1908961640448, 1908961644544) crossing stripe boundary
> bad metadata [1909004566528, 1909004570624) crossing stripe boundary
> bad metadata [1909015052288, 1909015056384) crossing stripe boundary
> bad metadata [1909015183360, 1909015187456) crossing stripe boundary
> Backref 1909016203264 parent 1890809671680 not referenced back 0x4fc37690
> Backref 1909016203264 parent 5105 root 5105 not found in extent tree
> Incorrect global backref count on 1909016203264 found 3 wanted 2
> backpointer mismatch on [1909016203264 4096]
> bad metadata [1909017804800, 1909017808896) crossing stripe boundary
> bad metadata [1909144289280, 1909144293376) crossing stripe boundary
> bad metadata [1909164539904, 1909164544000) crossing stripe boundary
> bad metadata [1909445033984, 1909445038080) crossing stripe boundary
> Errors found in extent allocation tree or chunk allocation
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> checking csums
> checking root refs
> found 716978874897 bytes used err is 0
> total csum bytes: 662660776
> total tree bytes: 39097315328
> total fs tree bytes: 36827172864
> total extent tree bytes: 1529536512
> btree space waste bytes: 11582963983
> file data blocks allocated: 1258206822400
>  referenced 1158920425472
> btrfs-progs v4.2.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-23  9:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-22 13:38 problem with long running btrfs mount operation S. Fricke
2015-09-22 13:53 ` Holger Hoffstätte
2015-09-22 14:31   ` Richard Michael
2015-09-22 14:43     ` S. Fricke
2015-09-22 14:57       ` Holger Hoffstätte
2015-09-22 15:13         ` Hugo Mills
2015-09-23  5:42         ` S. Fricke
2015-09-23  9:36           ` Holger Hoffstätte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).