From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.palepurple.co.uk ([89.16.183.188]:46593 "EHLO mail.palepurple.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753898AbbLIKrJ (ORCPT ); Wed, 9 Dec 2015 05:47:09 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.palepurple.co.uk (Postfix) with ESMTP id 90AE2812B for ; Wed, 9 Dec 2015 10:47:07 +0000 (GMT) Received: from mail.palepurple.co.uk ([127.0.0.1]) by localhost (mail.palepurple.co.uk [127.0.0.50]) (amavisd-new, port 10024) with ESMTP id JSWz_-XfyEUc for ; Wed, 9 Dec 2015 10:46:52 +0000 (GMT) Received: from [172.30.33.200] (host81-133-46-190.in-addr.btopenworld.com [81.133.46.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: david@palepurple.co.uk) by mail.palepurple.co.uk (Postfix) with ESMTPSA id 3A7C38111 for ; Wed, 9 Dec 2015 10:46:52 +0000 (GMT) From: David Goodwin Subject: kernel 4.1.13 - balance bug? To: "linux-btrfs@vger.kernel.org" Message-ID: <5668069C.6010608@codepoets.co.uk> Date: Wed, 9 Dec 2015 10:46:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi, Trying to run a balance on a filesystem results in the below dmesg output. Kernel is 4.1.13 from kernel.org. System is running in AWS (hence the Xen stuff). INFO: task btrfs:28938 blocked for more than 120 seconds. Not tainted 4.1.13-dg1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. btrfs D ffff8800eb016840 0 28938 28934 0x00000000 ffff8800e92694f0 ffffffff810046ff ffff8800e6dbc050 ffff8800ea6f4010 ffff880000078000 7fffffffffffffff ffff8800ae8f77d8 ffff8800e92694f0 ffff8800ae19ce20 0000000000000001 ffffffff81574a4f ffff8800ae8f77e0 Call Trace: [] ? xen_load_sp0+0x7f/0x130 [] ? schedule+0x2f/0x80 [] ? schedule_timeout+0x21a/0x290 [] ? __schedule+0x2b0/0x930 [] ? wait_for_completion+0xb5/0x180 [] ? wake_up_state+0x20/0x20 [] ? btrfs_async_run_delayed_refs+0x126/0x150 [btrfs] [] ? __btrfs_end_transaction+0x27e/0x410 [btrfs] [] ? relocate_block_group+0x277/0x6b0 [btrfs] [] ? btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs] [] ? btrfs_relocate_chunk.isra.38+0x3e/0xd0 [btrfs] [] ? btrfs_balance+0x91c/0xed0 [btrfs] [] ? btrfs_ioctl_balance+0x3ad/0x420 [btrfs] [] ? btrfs_ioctl+0x570/0x27e0 [btrfs] [] ? xen_set_pte_at+0x85/0x2a0 [] ? handle_mm_fault+0xc0c/0x1640 [] ? do_vfs_ioctl+0x2e8/0x4f0 [] ? __do_page_fault+0x1d1/0x490 [] ? SyS_ioctl+0x81/0xa0 [] ? system_call_fastpath+0x16/0x75 After rebooting, the same message comes up if I try to resume the balance. After rebooting again, if I cancel the balance (mount -o ... skip_balance ; btrfs balance cancel /whatever ) and then run 'btrfs check --repair /dev/whatever' I see messages like : bad metadata [279388160000, 279388176384) crossing stripe boundary bad metadata [279388946432, 279388962816) crossing stripe boundary bad metadata [279389208576, 279389224960) crossing stripe boundary ... (repeat a few times) repaired damaged extent references Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots reset nbytes for ino 1571860 root 1412 warning line 3597 checking csums checking root refs found 538506966512 bytes used err is 0 total csum bytes: 500379820 total tree bytes: 26440007680 total fs tree bytes: 23403429888 total extent tree bytes: 2439593984 btree space waste bytes: 6268486972 file data blocks allocated: 1918116421632 referenced 1946322268160 btrfs-progs v4.2.3 But after this 'repair' a balance will still not complete, with the same error as above being generated. Otherwise the FS seems usable ... so for now I'll just skip running a balance on it and hope it's fixed in a newer kernel when I eventually upgrade. thanks David.