From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:62368 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751429AbaFQEnP (ORCPT ); Tue, 17 Jun 2014 00:43:15 -0400 Message-ID: <539FC896.5010909@fb.com> Date: Tue, 17 Jun 2014 00:48:22 -0400 From: Chris Mason MIME-Version: 1.0 To: Tsutomu Itoh CC: Linux Btrfs Subject: Re: [3.16-rc1] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u25:4:5189] References: <539E9026.1020406@jp.fujitsu.com> <539F2FEA.1060809@fb.com> <539F7D8E.9010608@jp.fujitsu.com> <539F8351.3010700@fb.com> <539F846A.4080608@jp.fujitsu.com> <539F9006.4090109@fb.com> <539FA395.5050704@jp.fujitsu.com> <539FB0AE.7010404@jp.fujitsu.com> In-Reply-To: <539FB0AE.7010404@jp.fujitsu.com> Content-Type: text/plain; charset="ISO-2022-JP" Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 06/16/2014 11:06 PM, Tsutomu Itoh wrote: > On 2014/06/17 11:10, Tsutomu Itoh wrote: >> On 2014/06/17 9:47, Chris Mason wrote: >>> On 06/16/2014 07:57 PM, Tsutomu Itoh wrote: >>>> On 2014/06/17 8:52, Chris Mason wrote: >>>>> On 06/16/2014 07:28 PM, Tsutomu Itoh wrote: >>>>>> Hi Chris, >>>>>> >>>>>> On 2014/06/17 2:56, Chris Mason wrote: >>>>>>> On 06/16/2014 02:35 AM, Tsutomu Itoh wrote: >>>>>>>> I encountered soft lockup when executing 'xfstests btrfs/042' on 3.16-rc1. >>>>>>>> >>>>>>> >>>>>>> Did we recover, or was it stuck forever? >>>>>> >>>>>> The following messages are repeatedly output. >>>>>> And stuck forever. >>>>>> >>>>>> [ 1147.942181] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u25:4:5189] >>>>>> [ 1147.967175] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/u25:9:5194] >>>>>> [ 1147.979172] BUG: soft lockup - CPU#4 stuck for 23s! [kworker/u25:15:5200] >>>>>> [ 1147.991169] BUG: soft lockup - CPU#5 stuck for 23s! [kworker/u25:7:5192] >>>>>> [ 1148.064153] BUG: soft lockup - CPU#6 stuck for 23s! [kworker/u26:3:3182] >>>>> >>>>> Can you please capture a stack trace from all the cpus? >>> >>> Very strange, please try to reproduce again, I'll dig through things here. >> >> I can reproduce it easily in my environment. > > This is my reproducer. > Great, I was able to trigger it here, but only with lockdep disabled. When called from this part of btrfs_search_slot, the code for btrfs_set_path_blocking goes through an extra step to set the lock on the extent buffer we just found blocking before it is put into the path. But this is only done with lockdep on. With lockdep off, we assume the lock ordering inside the tree is protecting us and that we don't need to worry about blocks that are not in the path yet. Something is breaking this rule, probably in the quota code. I'll try to nail down what is going on. -chris