From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from fgwmail2.fujitsu.co.jp ([164.71.1.135]:57623 "EHLO fgwmail2.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753775AbaFTDVh (ORCPT ); Thu, 19 Jun 2014 23:21:37 -0400 Received: from kw-mxauth.gw.nic.fujitsu.com (unknown [10.0.237.134]) by fgwmail2.fujitsu.co.jp (Postfix) with ESMTP id 799A13EE0C8 for ; Fri, 20 Jun 2014 12:21:35 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.nic.fujitsu.com [10.0.50.92]) by kw-mxauth.gw.nic.fujitsu.com (Postfix) with ESMTP id 8AE13AC07F3 for ; Fri, 20 Jun 2014 12:21:34 +0900 (JST) Received: from m1000.s.css.fujitsu.com (m1000.s.css.fujitsu.com [10.240.81.136]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 2DC5F1DB803C for ; Fri, 20 Jun 2014 12:21:34 +0900 (JST) Message-ID: <53A3A897.3050202@jp.fujitsu.com> Date: Fri, 20 Jun 2014 12:20:55 +0900 From: Tsutomu Itoh MIME-Version: 1.0 To: Chris Mason CC: Waiman Long , Marc Dionne , Josef Bacik , linux-btrfs@vger.kernel.org Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected References: <53A20FFF.3010807@hp.com> <53A2125B.3050701@fb.com> <53A21702.8090109@hp.com> <53A21C78.1040809@fb.com> <53A21E84.2050103@hp.com> <53A22064.7080400@fb.com> <53A2212E.7090907@hp.com> <53A2268F.3080807@fb.com> <53A22A01.7080505@hp.com> <53A246C6.5050408@fb.com> <53A2573B.1060901@hp.com> <53A31514.8030308@fb.com> <53A32353.5000104@hp.com> <53A343B9.3000900@fb.com> <53A35B31.4060203@fb.com> <53A3708E.9060203@hp.com> In-Reply-To: <53A3708E.9060203@hp.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2014/06/20 8:21, Waiman Long wrote: > On 06/19/2014 05:50 PM, Chris Mason wrote: >>>> >>>> I would like to take back my comments. I took out the read_lock, but the >>>> process still hang while doing file activities on btrfs filesystem. So >>>> the problem is trickier than I thought. Below are the stack backtraces >>>> of some of the relevant processes. >>>> >>> You weren't wrong, but it was also the tree trylock code. Our trylocks >>> only back off if the blocking lock is held. btrfs_next_leaf needs it to >>> be a true trylock. The confusing part is this hasn't really changed, >>> but one of the callers must be a spinner where we used to have a blocker. >> This is what I have queued up, it's working here. >> >> -chris >> >> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e >> Author: Chris Mason >> Date: Thu Jun 19 14:16:52 2014 -0700 >> >> Btrfs: fix deadlocks with trylock on tree nodes >> >> The Btrfs tree trylock function is poorly named. It always takes >> the spinlock and backs off if the blocking lock is held. This >> can lead to surprising lockups because people expect it to really be a >> trylock. >> >> This commit makes it a pure trylock, both for the spinlock and the >> blocking lock. It also reworks the nested lock handling slightly to >> avoid taking the read lock while a spinning write lock might be held. >> >> Signed-off-by: Chris Mason > > I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed: > > INFO: task btrfs-transacti:16576 blocked for more than 120 seconds. > Tainted: G E 3.16.0-rc1 #5 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > btrfs-transacti D 000000000000000f 0 16576 2 0x00000000 > ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010 > 0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310 > ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff > Call Trace: > [] schedule+0x29/0x70 > [] schedule_timeout+0x13d/0x1d0 > [] ? wake_up_worker+0x24/0x30 > [] ? insert_work+0x65/0xb0 > [] wait_for_completion+0xc6/0x100 > [] ? try_to_wake_up+0x220/0x220 > [] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs] > [] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs] > [] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs] > [] btrfs_commit_transaction+0x229/0xa30 [btrfs] > [] ? lock_timer_base+0x70/0x70 > [] transaction_kthread+0x1eb/0x270 [btrfs] > [] ? close_ctree+0x2d0/0x2d0 [btrfs] > [] kthread+0xce/0xf0 > [] ? kthread_freezable_should_stop+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x70/0x70 > > It looks like some more work may still be needed. Or it could be a problem in my system configuration. > Umm, after applying Chris's patch to my environment, xfstests ran completely and the above messages were not output. (Are above messages another bug?) Thanks, Tsutomu