From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from titan.nuclearwinter.com ([174.136.96.186]:36453 "EHLO mail.nuclearwinter.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752054AbaIXPME convert rfc822-to-8bit (ORCPT ); Wed, 24 Sep 2014 11:12:04 -0400 Received: from [IPv6:2601:e:1200:11d0:b2c5:54ff:feff:f401] ([IPv6:2601:e:1200:11d0:b2c5:54ff:feff:f401]) (authenticated bits=0) by mail.nuclearwinter.com (8.14.4/8.14.4) with ESMTP id s8OERjG4031175 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Wed, 24 Sep 2014 09:27:46 -0500 Message-ID: <5422D4E0.8090605@nuclearwinter.com> Date: Wed, 24 Sep 2014 09:27:44 -0500 From: Larkin Lowrey MIME-Version: 1.0 To: linux-btrfs@vger.kernel.org Subject: btrfsck check infinite loop Content-Type: text/plain; charset=US-ASCII Sender: linux-btrfs-owner@vger.kernel.org List-ID: I ran 'btrfs check --repair --init-extent-tree' and appear to be in an infinite loop. It performed heavy IO for about 1.5 hours then the IO stopped and the CPU stayed at 100%. It's been like that for more than 12 hours now. I made a hardware change last week that resulted in unstable RAM so I suspect some corrupt data was written to disk. I tried mounting with -orecovery,clear_cache,nospace_cache but I would get a panic shortly thereafter. I tried 'btrfs check --repair' but also got a panic. I finally tried 'btrfs check --repair --init-extent-tree' and hit an assertion failed error with btrfs-progs 3.16. After noticing some promising commits, I built from the integration repo (kdave), re-ran (v3.16.1) and got further (2hrs) but then got stuck in this infinite loop. Here's the backtrace of where it is now and has been for hours: #0 0x0000000000438f01 in free_some_buffers (tree=0xda3078) at extent_io.c:553 #1 __alloc_extent_buffer (blocksize=4096, bytenr=, tree=0xda3078) at extent_io.c:592 #2 alloc_extent_buffer (tree=0xda3078, bytenr=, blocksize=4096) at extent_io.c:671 #3 0x000000000042be29 in btrfs_find_create_tree_block (root=root@entry=0xda34a0, bytenr=, blocksize=) at disk-io.c:133 #4 0x000000000042d683 in read_tree_block (root=0xda34a0, bytenr=, blocksize=, parent_transid=161580) at disk-io.c:260 #5 0x0000000000427c58 in read_node_slot (root=root@entry=0xda34a0, parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634 #6 0x0000000000428558 in push_leaf_right (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, path=path@entry=0xde317a0, data_size=data_size@entry=67, empty=empty@entry=0) at ctree.c:1608 #7 0x0000000000428e4c in split_leaf (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, ins_key=ins_key@entry=0x7fff24da24b0, path=path@entry=0xde317a0, data_size=data_size@entry=67, extend=extend@entry=0) at ctree.c:1977 #8 0x000000000042aa54 in btrfs_search_slot (trans=0xe709b0, root=root@entry=0xda34a0, key=key@entry=0x7fff24da24b0, p=p@entry=0xde317a0, ins_len=ins_len@entry=67, cow=cow@entry=1) at ctree.c:1120 #9 0x000000000042af51 in btrfs_insert_empty_items (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, path=path@entry=0xde317a0, cpu_key=cpu_key@entry=0x7fff24da24b0, data_size=data_size@entry=0x7fff24da24a0, nr=nr@entry=1) at ctree.c:2412 #10 0x00000000004175f6 in btrfs_insert_empty_item (data_size=42, key=0x7fff24da24b0, path=0xde317a0, root=0xda34a0, trans=0xe709b0) at ctree.h:2312 #11 record_extent (flags=0, allocated=, back=0x95cb3d90, rec=0x95cb3cc0, path=0xde317a0, info=0xda3010, trans=0xe709b0) at cmds-check.c:4438 #12 fixup_extent_refs (trans=trans@entry=0xe709b0, info=, extent_cache=extent_cache@entry=0x7fff24da2970, rec=rec@entry=0x95cb3cc0) at cmds-check.c:5287 #13 0x000000000041ac01 in check_extent_refs (extent_cache=0x7fff24da2970, root=, trans=) at cmds-check.c:5511 #14 check_chunks_and_extents (root=root@entry=0xfa7c70) at cmds-check.c:5978 #15 0x000000000041bdd9 in cmd_check (argc=, argv=) at cmds-check.c:6723 #16 0x0000000000404481 in main (argc=4, argv=0x7fff24da2fe0) at btrfs.c:247 I checked node, node->next, node->next->next, node->next->prev, etc. and saw no obvious loop, at least not in the immediate vicinity of node. The value of node is different each time I check it. I'll periodically see the following backtrace: #0 __list_del (next=0x1326fe820, prev=0xda3088) at list.h:113 #1 list_move_tail (head=0xda3088, list=0x1514b40f0) at list.h:183 #2 free_some_buffers (tree=0xda3078) at extent_io.c:560 #3 __alloc_extent_buffer (blocksize=4096, bytenr=, tree=0xda3078) at extent_io.c:592 #4 alloc_extent_buffer (tree=0xda3078, bytenr=, blocksize=4096) at extent_io.c:671 #5 0x000000000042be29 in btrfs_find_create_tree_block (root=root@entry=0xda34a0, bytenr=, blocksize=) at disk-io.c:133 #6 0x000000000042d683 in read_tree_block (root=0xda34a0, bytenr=, blocksize=, parent_transid=161580) at disk-io.c:260 #7 0x0000000000427c58 in read_node_slot (root=root@entry=0xda34a0, parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634 #8 0x0000000000428558 in push_leaf_right (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, path=path@entry=0xde317a0, data_size=data_size@entry=67, empty=empty@entry=0) at ctree.c:1608 #9 0x0000000000428e4c in split_leaf (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, ins_key=ins_key@entry=0x7fff24da24b0, path=path@entry=0xde317a0, data_size=data_size@entry=67, extend=extend@entry=0) at ctree.c:1977 #10 0x000000000042aa54 in btrfs_search_slot (trans=0xe709b0, root=root@entry=0xda34a0, key=key@entry=0x7fff24da24b0, p=p@entry=0xde317a0, ins_len=ins_len@entry=67, cow=cow@entry=1) at ctree.c:1120 #11 0x000000000042af51 in btrfs_insert_empty_items (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0, path=path@entry=0xde317a0, cpu_key=cpu_key@entry=0x7fff24da24b0, data_size=data_size@entry=0x7fff24da24a0, nr=nr@entry=1) at ctree.c:2412 #12 0x00000000004175f6 in btrfs_insert_empty_item (data_size=42, key=0x7fff24da24b0, path=0xde317a0, root=0xda34a0, trans=0xe709b0) at ctree.h:2312 #13 record_extent (flags=0, allocated=, back=0x95cb3d90, rec=0x95cb3cc0, path=0xde317a0, info=0xda3010, trans=0xe709b0) at cmds-check.c:4438 #14 fixup_extent_refs (trans=trans@entry=0xe709b0, info=, extent_cache=extent_cache@entry=0x7fff24da2970, rec=rec@entry=0x95cb3cc0) at cmds-check.c:5287 #15 0x000000000041ac01 in check_extent_refs (extent_cache=0x7fff24da2970, root=, trans=) at cmds-check.c:5511 #16 check_chunks_and_extents (root=root@entry=0xfa7c70) at cmds-check.c:5978 #17 0x000000000041bdd9 in cmd_check (argc=, argv=) at cmds-check.c:6723 #18 0x0000000000404481 in main (argc=4, argv=0x7fff24da2fe0) at btrfs.c:247 If there's interest in debugging I can leave this machine in this condition for a few days. It's just a backup server so losing the fs won't be the end of the world. --Larkin