From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from vs2.lukas-pirl.de ([5.45.100.90]:55200 "EHLO pim.lukas-pirl.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161853AbbKTJEd (ORCPT ); Fri, 20 Nov 2015 04:04:33 -0500 Received: from [192.168.1.5] (unknown [119.224.19.150]) by pim.lukas-pirl.de (Postfix) with ESMTPSA id AB0D21FC0BEC for ; Fri, 20 Nov 2015 09:04:29 +0000 (UTC) From: Lukas Pirl Subject: 4.2.6: livelock in recovery (free_reloc_roots)? To: linux-btrfs@vger.kernel.org Message-ID: <564EE213.3060007@lukas-pirl.de> Date: Fri, 20 Nov 2015 22:04:19 +1300 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Dear list, I am (still) trying to recover a RAID1 that can only be mounted recovery,degraded,ro. I experienced an issue that might be interesting for you: I tried to mount the file system rw,recovery and the kernel ended up burning one core (and only one specific core, never scheduled to another one). The watchdog printed a stack trace roughly every 20 seconds. There were only a few stack traces that were printed alternating (see below). After a few hours with the mount command still being blocked and without visible IO activity, the system was power-cycled. Summary: Call Trace: [] ? free_reloc_roots+0x11/0x30 [btrfs] [] ? free_reloc_roots+0x1d/0x30 [btrfs] [] ? merge_reloc_roots+0x165/0x220 [btrfs] [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] [] ? open_ctree+0x20d2/0x23b0 [btrfs] [] ? btrfs_mount+0x87b/0x990 [btrfs] [] ? pcpu_next_unpop+0x3f/0x50 [] ? mount_fs+0x36/0x170 [] ? vfs_kern_mount+0x68/0x110 [] ? btrfs_mount+0x1bb/0x990 [btrfs] … Call Trace: [] ? rcu_dump_cpu_stacks+0x80/0xb0 [] ? rcu_check_callbacks+0x421/0x6e0 [] ? sched_clock+0x5/0x10 [] ? notifier_call_chain+0x45/0x70 [] ? timekeeping_update+0xf1/0x150 [] ? tick_sched_do_timer+0x40/0x40 [] ? update_process_times+0x36/0x60 [] ? tick_sched_do_timer+0x40/0x40 [] ? tick_sched_handle.isra.15+0x24/0x60 [] ? tick_sched_do_timer+0x40/0x40 [] ? tick_sched_timer+0x3b/0x70 [] ? __hrtimer_run_queues+0xdc/0x210 [] ? read_tsc+0x5/0x10 [] ? read_tsc+0x5/0x10 [] ? hrtimer_interrupt+0x9a/0x190 [] ? smp_apic_timer_interrupt+0x39/0x50 [] ? apic_timer_interrupt+0x6b/0x70 [] ? _raw_spin_lock+0x10/0x20 [] ? __del_reloc_root+0x2f/0x100 [btrfs] [] ? __add_reloc_root+0xe0/0xe0 [btrfs] [] ? free_reloc_roots+0x1d/0x30 [btrfs] [] ? merge_reloc_roots+0x165/0x220 [btrfs] [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] [] ? open_ctree+0x20d2/0x23b0 [btrfs] [] ? btrfs_mount+0x87b/0x990 [btrfs] [] ? pcpu_next_unpop+0x3f/0x50 [] ? mount_fs+0x36/0x170 [] ? vfs_kern_mount+0x68/0x110 [] ? btrfs_mount+0x1bb/0x990 [btrfs] … Call Trace: [] ? __del_reloc_root+0x2f/0x100 [btrfs] [] ? free_reloc_roots+0x1d/0x30 [btrfs] [] ? merge_reloc_roots+0x165/0x220 [btrfs] [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] [] ? open_ctree+0x20d2/0x23b0 [btrfs] [] ? btrfs_mount+0x87b/0x990 [btrfs] [] ? pcpu_next_unpop+0x3f/0x50 [] ? mount_fs+0x36/0x170 [] ? vfs_kern_mount+0x68/0x110 [] ? btrfs_mount+0x1bb/0x990 [btrfs] … A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3. btrfs check --readonly gave no errors. (except the probably false positives mentioned here http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html) Reading the whole file system worked also. If you need more information to trace this back, let me know and I'll try to get it. If you have suggestions regarding the recovery, please let me know as well. Best regards, Lukas