From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from vs2.lukas-pirl.de ([5.45.100.90]:55940 "EHLO pim.lukas-pirl.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751536AbcCBNB7 (ORCPT ); Wed, 2 Mar 2016 08:01:59 -0500 Received: from [192.168.1.108] (p5DC471EE.dip0.t-ipconnect.de [93.196.113.238]) by pim.lukas-pirl.de (Postfix) with ESMTPSA id 7A62B2E7F74A for ; Wed, 2 Mar 2016 12:56:53 +0000 (UTC) Subject: Still in 4.4.0: livelock in recovery (free_reloc_roots) To: linux-btrfs@vger.kernel.org References: <564EE213.3060007@lukas-pirl.de> From: Lukas Pirl Message-ID: <56D6E314.1050100@lukas-pirl.de> Date: Wed, 2 Mar 2016 13:56:52 +0100 MIME-Version: 1.0 In-Reply-To: <564EE213.3060007@lukas-pirl.de> Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 11/20/2015 10:04 AM, Lukas Pirl wrote as excerpted: > I am (still) trying to recover a RAID1 that can only be mounted > recovery,degraded,ro. > > I experienced an issue that might be interesting for you: I tried to > mount the file system rw,recovery and the kernel ended up burning one > core (and only one specific core, never scheduled to another one). > > The watchdog printed a stack trace roughly every 20 seconds. There were > only a few stack traces that were printed alternating (see below). > After a few hours with the mount command still being blocked and without > visible IO activity, the system was power-cycled. > > Summary: > > Call Trace: > [] ? free_reloc_roots+0x11/0x30 [btrfs] > [] ? free_reloc_roots+0x1d/0x30 [btrfs] > [] ? merge_reloc_roots+0x165/0x220 [btrfs] > [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] > [] ? open_ctree+0x20d2/0x23b0 [btrfs] > [] ? btrfs_mount+0x87b/0x990 [btrfs] > [] ? pcpu_next_unpop+0x3f/0x50 > [] ? mount_fs+0x36/0x170 > [] ? vfs_kern_mount+0x68/0x110 > [] ? btrfs_mount+0x1bb/0x990 [btrfs] > … > > Call Trace: > [] ? rcu_dump_cpu_stacks+0x80/0xb0 > [] ? rcu_check_callbacks+0x421/0x6e0 > [] ? sched_clock+0x5/0x10 > [] ? notifier_call_chain+0x45/0x70 > [] ? timekeeping_update+0xf1/0x150 > [] ? tick_sched_do_timer+0x40/0x40 > [] ? update_process_times+0x36/0x60 > [] ? tick_sched_do_timer+0x40/0x40 > [] ? tick_sched_handle.isra.15+0x24/0x60 > [] ? tick_sched_do_timer+0x40/0x40 > [] ? tick_sched_timer+0x3b/0x70 > [] ? __hrtimer_run_queues+0xdc/0x210 > [] ? read_tsc+0x5/0x10 > [] ? read_tsc+0x5/0x10 > [] ? hrtimer_interrupt+0x9a/0x190 > [] ? smp_apic_timer_interrupt+0x39/0x50 > [] ? apic_timer_interrupt+0x6b/0x70 > [] ? _raw_spin_lock+0x10/0x20 > [] ? __del_reloc_root+0x2f/0x100 [btrfs] > [] ? __add_reloc_root+0xe0/0xe0 [btrfs] > [] ? free_reloc_roots+0x1d/0x30 [btrfs] > [] ? merge_reloc_roots+0x165/0x220 [btrfs] > [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] > [] ? open_ctree+0x20d2/0x23b0 [btrfs] > [] ? btrfs_mount+0x87b/0x990 [btrfs] > [] ? pcpu_next_unpop+0x3f/0x50 > [] ? mount_fs+0x36/0x170 > [] ? vfs_kern_mount+0x68/0x110 > [] ? btrfs_mount+0x1bb/0x990 [btrfs] > … > > Call Trace: > [] ? __del_reloc_root+0x2f/0x100 [btrfs] > [] ? free_reloc_roots+0x1d/0x30 [btrfs] > [] ? merge_reloc_roots+0x165/0x220 [btrfs] > [] ? btrfs_recover_relocation+0x293/0x380 [btrfs] > [] ? open_ctree+0x20d2/0x23b0 [btrfs] > [] ? btrfs_mount+0x87b/0x990 [btrfs] > [] ? pcpu_next_unpop+0x3f/0x50 > [] ? mount_fs+0x36/0x170 > [] ? vfs_kern_mount+0x68/0x110 > [] ? btrfs_mount+0x1bb/0x990 [btrfs] > … > > A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy > > I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3. > > btrfs check --readonly gave no errors. > (except the probably false positives mentioned here > http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html) > > Reading the whole file system worked also. > > If you need more information to trace this back, let me know and I'll > try to get it. > If you have suggestions regarding the recovery, please let me know as well.