Lockups with btrfs on 3.16-rc1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Lockups with btrfs on 3.16-rc1 - bisected
@ 2014-06-18 20:57 Marc Dionne
  2014-06-18 22:17 ` Waiman Long
  2014-06-19  9:49 ` btrfs-transacti:516 blocked 120 seconds on 3.16-rc1 Konstantinos Skarlatos
  0 siblings, 2 replies; 23+ messages in thread
From: Marc Dionne @ 2014-06-18 20:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, Waiman.Long, t-itoh

Hi,

I've been seeing very reproducible soft lockups with 3.16-rc1 similar
to what is reported here:
http://marc.info/?l=linux-btrfs&m=140290088532203&w=2 , along with the
occasional hard lockup, making it impossible to complete a parallel
build on a btrfs filesystem for the package I work on.  This was
working fine just a few days before rc1.

Bisecting brought me to the following commit:

  commit bd01ec1a13f9a327950c8e3080096446c7804753
  Author: Waiman Long <Waiman.Long@hp.com>
  Date:   Mon Feb 3 13:18:57 2014 +0100

      x86, locking/rwlocks: Enable qrwlocks on x86

And sure enough if I revert that commit on top of current mainline,
I'm unable to reproduce the soft lockups and hangs.

Marc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 20:57 Lockups with btrfs on 3.16-rc1 - bisected Marc Dionne
@ 2014-06-18 22:17 ` Waiman Long
  2014-06-18 22:27   ` Josef Bacik
  2014-06-19  9:49 ` btrfs-transacti:516 blocked 120 seconds on 3.16-rc1 Konstantinos Skarlatos
  1 sibling, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-18 22:17 UTC (permalink / raw)
  To: Marc Dionne; +Cc: linux-btrfs, clm, t-itoh

On 06/18/2014 04:57 PM, Marc Dionne wrote:
> Hi,
>
> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
> to what is reported here:
> http://marc.info/?l=linux-btrfs&m=140290088532203&w=2 , along with the
> occasional hard lockup, making it impossible to complete a parallel
> build on a btrfs filesystem for the package I work on.  This was
> working fine just a few days before rc1.
>
> Bisecting brought me to the following commit:
>
>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>    Author: Waiman Long<Waiman.Long@hp.com>
>    Date:   Mon Feb 3 13:18:57 2014 +0100
>
>        x86, locking/rwlocks: Enable qrwlocks on x86
>
> And sure enough if I revert that commit on top of current mainline,
> I'm unable to reproduce the soft lockups and hangs.
>
> Marc

The queue rwlock is fair. As a result, recursive read_lock is not 
allowed unless the task is in an interrupt context. Doing recursive 
read_lock will hang the process when a write_lock happens somewhere in 
between. Are recursive read_lock being done in the btrfs code?

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 22:17 ` Waiman Long
@ 2014-06-18 22:27   ` Josef Bacik
  2014-06-18 22:47     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2014-06-18 22:27 UTC (permalink / raw)
  To: Waiman Long, Marc Dionne; +Cc: linux-btrfs, clm, t-itoh



On 06/18/2014 03:17 PM, Waiman Long wrote:
> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>> Hi,
>>
>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>> to what is reported here:
>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>> , along with the
>> occasional hard lockup, making it impossible to complete a parallel
>> build on a btrfs filesystem for the package I work on.  This was
>> working fine just a few days before rc1.
>>
>> Bisecting brought me to the following commit:
>>
>>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>>    Author: Waiman Long<Waiman.Long@hp.com>
>>    Date:   Mon Feb 3 13:18:57 2014 +0100
>>
>>        x86, locking/rwlocks: Enable qrwlocks on x86
>>
>> And sure enough if I revert that commit on top of current mainline,
>> I'm unable to reproduce the soft lockups and hangs.
>>
>> Marc
>
> The queue rwlock is fair. As a result, recursive read_lock is not
> allowed unless the task is in an interrupt context. Doing recursive
> read_lock will hang the process when a write_lock happens somewhere in
> between. Are recursive read_lock being done in the btrfs code?
>

We walk down a tree and read lock each node as we walk down, is that 
what you mean?  Or do you mean read_lock multiple times on the same lock 
in the same process, cause we definitely don't do that.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 22:27   ` Josef Bacik
@ 2014-06-18 22:47     ` Waiman Long
  2014-06-18 23:10       ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-18 22:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Marc Dionne, linux-btrfs, clm, t-itoh

On 06/18/2014 06:27 PM, Josef Bacik wrote:
>
>
> On 06/18/2014 03:17 PM, Waiman Long wrote:
>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>> Hi,
>>>
>>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>>> to what is reported here:
>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565 
>>>
>>> , along with the
>>> occasional hard lockup, making it impossible to complete a parallel
>>> build on a btrfs filesystem for the package I work on.  This was
>>> working fine just a few days before rc1.
>>>
>>> Bisecting brought me to the following commit:
>>>
>>>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>    Author: Waiman Long<Waiman.Long@hp.com>
>>>    Date:   Mon Feb 3 13:18:57 2014 +0100
>>>
>>>        x86, locking/rwlocks: Enable qrwlocks on x86
>>>
>>> And sure enough if I revert that commit on top of current mainline,
>>> I'm unable to reproduce the soft lockups and hangs.
>>>
>>> Marc
>>
>> The queue rwlock is fair. As a result, recursive read_lock is not
>> allowed unless the task is in an interrupt context. Doing recursive
>> read_lock will hang the process when a write_lock happens somewhere in
>> between. Are recursive read_lock being done in the btrfs code?
>>
>
> We walk down a tree and read lock each node as we walk down, is that 
> what you mean?  Or do you mean read_lock multiple times on the same 
> lock in the same process, cause we definitely don't do that.  Thanks,
>
> Josef

I meant recursively read_lock the same lock in a process.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 22:47     ` Waiman Long
@ 2014-06-18 23:10       ` Josef Bacik
  2014-06-18 23:19         ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2014-06-18 23:10 UTC (permalink / raw)
  To: Waiman Long; +Cc: Marc Dionne, linux-btrfs, clm, t-itoh



On 06/18/2014 03:47 PM, Waiman Long wrote:
> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>
>>
>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>> Hi,
>>>>
>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>>>> to what is reported here:
>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>
>>>> , along with the
>>>> occasional hard lockup, making it impossible to complete a parallel
>>>> build on a btrfs filesystem for the package I work on.  This was
>>>> working fine just a few days before rc1.
>>>>
>>>> Bisecting brought me to the following commit:
>>>>
>>>>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>    Author: Waiman Long<Waiman.Long@hp.com>
>>>>    Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>
>>>>        x86, locking/rwlocks: Enable qrwlocks on x86
>>>>
>>>> And sure enough if I revert that commit on top of current mainline,
>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>
>>>> Marc
>>>
>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>> allowed unless the task is in an interrupt context. Doing recursive
>>> read_lock will hang the process when a write_lock happens somewhere in
>>> between. Are recursive read_lock being done in the btrfs code?
>>>
>>
>> We walk down a tree and read lock each node as we walk down, is that
>> what you mean?  Or do you mean read_lock multiple times on the same
>> lock in the same process, cause we definitely don't do that.  Thanks,
>>
>> Josef
>
> I meant recursively read_lock the same lock in a process.

I take it back, we do actually do this in some cases.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 23:10       ` Josef Bacik
@ 2014-06-18 23:19         ` Waiman Long
  2014-06-18 23:27           ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-18 23:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Marc Dionne, linux-btrfs, clm, t-itoh

On 06/18/2014 07:10 PM, Josef Bacik wrote:
>
>
> On 06/18/2014 03:47 PM, Waiman Long wrote:
>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>
>>>
>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>> Hi,
>>>>>
>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>>>>> to what is reported here:
>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565 
>>>>>
>>>>>
>>>>> , along with the
>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>> working fine just a few days before rc1.
>>>>>
>>>>> Bisecting brought me to the following commit:
>>>>>
>>>>>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>    Author: Waiman Long<Waiman.Long@hp.com>
>>>>>    Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>
>>>>>        x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>
>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>
>>>>> Marc
>>>>
>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>> read_lock will hang the process when a write_lock happens somewhere in
>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>
>>>
>>> We walk down a tree and read lock each node as we walk down, is that
>>> what you mean?  Or do you mean read_lock multiple times on the same
>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>
>>> Josef
>>
>> I meant recursively read_lock the same lock in a process.
>
> I take it back, we do actually do this in some cases.  Thanks,
>
> Josef

This is what I thought when I looked at the looking code in btrfs. The 
unlock code doesn't clear the lock_owner pid, this may cause the 
lock_nested to be set incorrectly.

Anyway, are you going to do something about it?

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 23:19         ` Waiman Long
@ 2014-06-18 23:27           ` Chris Mason
  2014-06-18 23:30             ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-18 23:27 UTC (permalink / raw)
  To: Waiman Long, Josef Bacik; +Cc: Marc Dionne, linux-btrfs, t-itoh

On 06/18/2014 07:19 PM, Waiman Long wrote:
> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>
>>
>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>
>>>>
>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>>>>>> to what is reported here:
>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>
>>>>>>
>>>>>> , along with the
>>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>> working fine just a few days before rc1.
>>>>>>
>>>>>> Bisecting brought me to the following commit:
>>>>>>
>>>>>>    commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>    Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>    Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>
>>>>>>        x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>
>>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>
>>>>>> Marc
>>>>>
>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>>> read_lock will hang the process when a write_lock happens somewhere in
>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>
>>>>
>>>> We walk down a tree and read lock each node as we walk down, is that
>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>>
>>>> Josef
>>>
>>> I meant recursively read_lock the same lock in a process.
>>
>> I take it back, we do actually do this in some cases.  Thanks,
>>
>> Josef
> 
> This is what I thought when I looked at the looking code in btrfs. The
> unlock code doesn't clear the lock_owner pid, this may cause the
> lock_nested to be set incorrectly.
> 
> Anyway, are you going to do something about it?

Thanks for reporting this, we shouldn't be actually taking the lock
recursively.  Could you please try with lockdep enabled?  If the problem
goes away with lockdep on, I think I know what's causing it.  Otherwise,
lockdep should clue us in.

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 23:27           ` Chris Mason
@ 2014-06-18 23:30             ` Waiman Long
  2014-06-18 23:53               ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-18 23:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, Marc Dionne, linux-btrfs, t-itoh

On 06/18/2014 07:27 PM, Chris Mason wrote:
> On 06/18/2014 07:19 PM, Waiman Long wrote:
>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>
>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>
>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 similar
>>>>>>> to what is reported here:
>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>
>>>>>>>
>>>>>>> , along with the
>>>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>> working fine just a few days before rc1.
>>>>>>>
>>>>>>> Bisecting brought me to the following commit:
>>>>>>>
>>>>>>>     commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>     Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>     Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>
>>>>>>>         x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>
>>>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>
>>>>>>> Marc
>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>>>> read_lock will hang the process when a write_lock happens somewhere in
>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>
>>>>> We walk down a tree and read lock each node as we walk down, is that
>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>>>
>>>>> Josef
>>>> I meant recursively read_lock the same lock in a process.
>>> I take it back, we do actually do this in some cases.  Thanks,
>>>
>>> Josef
>> This is what I thought when I looked at the looking code in btrfs. The
>> unlock code doesn't clear the lock_owner pid, this may cause the
>> lock_nested to be set incorrectly.
>>
>> Anyway, are you going to do something about it?
> Thanks for reporting this, we shouldn't be actually taking the lock
> recursively.  Could you please try with lockdep enabled?  If the problem
> goes away with lockdep on, I think I know what's causing it.  Otherwise,
> lockdep should clue us in.
>
> -chris

I am not sure if lockdep will report recursive read_lock as this is 
possible in the past. If not, we certainly need to add that capability 
to it.

One more thing, I saw comment in btrfs tree locking code about taking a 
read lock after taking a write (partial?) lock. That is not possible 
with even with the old rwlock code.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 23:30             ` Waiman Long
@ 2014-06-18 23:53               ` Chris Mason
  2014-06-19  0:03                 ` Marc Dionne
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-18 23:53 UTC (permalink / raw)
  To: Waiman Long; +Cc: Josef Bacik, Marc Dionne, linux-btrfs, t-itoh

On 06/18/2014 07:30 PM, Waiman Long wrote:
> On 06/18/2014 07:27 PM, Chris Mason wrote:
>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>
>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>
>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>> similar
>>>>>>>> to what is reported here:
>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> , along with the
>>>>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>>> working fine just a few days before rc1.
>>>>>>>>
>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>
>>>>>>>>     commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>     Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>     Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>
>>>>>>>>         x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>
>>>>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>
>>>>>>>> Marc
>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>> somewhere in
>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>
>>>>>> We walk down a tree and read lock each node as we walk down, is that
>>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>>>>
>>>>>> Josef
>>>>> I meant recursively read_lock the same lock in a process.
>>>> I take it back, we do actually do this in some cases.  Thanks,
>>>>
>>>> Josef
>>> This is what I thought when I looked at the looking code in btrfs. The
>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>> lock_nested to be set incorrectly.
>>>
>>> Anyway, are you going to do something about it?
>> Thanks for reporting this, we shouldn't be actually taking the lock
>> recursively.  Could you please try with lockdep enabled?  If the problem
>> goes away with lockdep on, I think I know what's causing it.  Otherwise,
>> lockdep should clue us in.
>>
>> -chris
> 
> I am not sure if lockdep will report recursive read_lock as this is
> possible in the past. If not, we certainly need to add that capability
> to it.
> 
> One more thing, I saw comment in btrfs tree locking code about taking a
> read lock after taking a write (partial?) lock. That is not possible
> with even with the old rwlock code.

With lockdep on, the clear_path_blocking function you're hitting
softlockups in is different.  Futjitsu hit a similar problem during
quota rescans, and it goes away with lockdep on.  I'm trying to nail
down where we went wrong, but please try lockdep on.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-18 23:53               ` Chris Mason
@ 2014-06-19  0:03                 ` Marc Dionne
  2014-06-19  0:08                   ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Dionne @ 2014-06-19  0:03 UTC (permalink / raw)
  To: Chris Mason; +Cc: Waiman Long, Josef Bacik, linux-btrfs, t-itoh

On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason <clm@fb.com> wrote:
> On 06/18/2014 07:30 PM, Waiman Long wrote:
>> On 06/18/2014 07:27 PM, Chris Mason wrote:
>>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>>
>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>>
>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>>> similar
>>>>>>>>> to what is reported here:
>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> , along with the
>>>>>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>>>> working fine just a few days before rc1.
>>>>>>>>>
>>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>>
>>>>>>>>>     commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>>     Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>>     Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>>
>>>>>>>>>         x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>>
>>>>>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>>
>>>>>>>>> Marc
>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>>> somewhere in
>>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>>
>>>>>>> We walk down a tree and read lock each node as we walk down, is that
>>>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>>>>>
>>>>>>> Josef
>>>>>> I meant recursively read_lock the same lock in a process.
>>>>> I take it back, we do actually do this in some cases.  Thanks,
>>>>>
>>>>> Josef
>>>> This is what I thought when I looked at the looking code in btrfs. The
>>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>>> lock_nested to be set incorrectly.
>>>>
>>>> Anyway, are you going to do something about it?
>>> Thanks for reporting this, we shouldn't be actually taking the lock
>>> recursively.  Could you please try with lockdep enabled?  If the problem
>>> goes away with lockdep on, I think I know what's causing it.  Otherwise,
>>> lockdep should clue us in.
>>>
>>> -chris
>>
>> I am not sure if lockdep will report recursive read_lock as this is
>> possible in the past. If not, we certainly need to add that capability
>> to it.
>>
>> One more thing, I saw comment in btrfs tree locking code about taking a
>> read lock after taking a write (partial?) lock. That is not possible
>> with even with the old rwlock code.
>
> With lockdep on, the clear_path_blocking function you're hitting
> softlockups in is different.  Futjitsu hit a similar problem during
> quota rescans, and it goes away with lockdep on.  I'm trying to nail
> down where we went wrong, but please try lockdep on.
>
> -chris

With lockdep on I'm unable to reproduce the lockups, and there are no
lockdep warnings.

Marc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  0:03                 ` Marc Dionne
@ 2014-06-19  0:08                   ` Waiman Long
  2014-06-19  0:41                     ` Marc Dionne
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-19  0:08 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Chris Mason, Josef Bacik, linux-btrfs, t-itoh

On 06/18/2014 08:03 PM, Marc Dionne wrote:
> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com>  wrote:
>> On 06/18/2014 07:30 PM, Waiman Long wrote:
>>> On 06/18/2014 07:27 PM, Chris Mason wrote:
>>>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>>>> similar
>>>>>>>>>> to what is reported here:
>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> , along with the
>>>>>>>>>> occasional hard lockup, making it impossible to complete a parallel
>>>>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>>>>> working fine just a few days before rc1.
>>>>>>>>>>
>>>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>>>
>>>>>>>>>>      commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>>>      Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>>>      Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>>>
>>>>>>>>>>          x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>>>
>>>>>>>>>> And sure enough if I revert that commit on top of current mainline,
>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>>>
>>>>>>>>>> Marc
>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>>>> allowed unless the task is in an interrupt context. Doing recursive
>>>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>>>> somewhere in
>>>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>>>
>>>>>>>> We walk down a tree and read lock each node as we walk down, is that
>>>>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>>>>> lock in the same process, cause we definitely don't do that.  Thanks,
>>>>>>>>
>>>>>>>> Josef
>>>>>>> I meant recursively read_lock the same lock in a process.
>>>>>> I take it back, we do actually do this in some cases.  Thanks,
>>>>>>
>>>>>> Josef
>>>>> This is what I thought when I looked at the looking code in btrfs. The
>>>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>>>> lock_nested to be set incorrectly.
>>>>>
>>>>> Anyway, are you going to do something about it?
>>>> Thanks for reporting this, we shouldn't be actually taking the lock
>>>> recursively.  Could you please try with lockdep enabled?  If the problem
>>>> goes away with lockdep on, I think I know what's causing it.  Otherwise,
>>>> lockdep should clue us in.
>>>>
>>>> -chris
>>> I am not sure if lockdep will report recursive read_lock as this is
>>> possible in the past. If not, we certainly need to add that capability
>>> to it.
>>>
>>> One more thing, I saw comment in btrfs tree locking code about taking a
>>> read lock after taking a write (partial?) lock. That is not possible
>>> with even with the old rwlock code.
>> With lockdep on, the clear_path_blocking function you're hitting
>> softlockups in is different.  Futjitsu hit a similar problem during
>> quota rescans, and it goes away with lockdep on.  I'm trying to nail
>> down where we went wrong, but please try lockdep on.
>>
>> -chris
> With lockdep on I'm unable to reproduce the lockups, and there are no
> lockdep warnings.
>
> Marc

Enabling lockdep may change the lock timing that make it hard to 
reproduce the problem. Anyway, could you try to apply the following 
patch to see if it shows any warning?

-Longman

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index d24e433..b6c9f2e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct 
held_loc
                 if (hlock_class(prev) != hlock_class(next))
                         continue;

+#ifdef CONFIG_QUEUE_RWLOCK
+               /*
+                * Queue rwlock only allows read-after-read recursion of the
+                * same lock class when the latter read is in an interrupt
+                * context.
+                */
+               if ((read == 2) && prev->read && in_interrupt())
+                       return 2;
+#else
                 /*
                  * Allow read-after-read recursion of the same
                  * lock class (i.e. read_lock(lock)+read_lock(lock)):
                  */
                 if ((read == 2) && prev->read)
                         return 2;
+#endif

                 /*
                  * We're holding the nest_lock, which serializes this 
lock's
@@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct 
held_lock
          * write-lock never takes any other locks, then the reads are
          * equivalent to a NOP.
          */
+#ifndef CONFIG_QUEUE_RWLOCK
         if (next->read == 2 || prev->read == 2)
                 return 1;
+#endif
         /*
          * Is the <prev> -> <next> dependency already present?
          *


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  0:08                   ` Waiman Long
@ 2014-06-19  0:41                     ` Marc Dionne
  2014-06-19  2:03                       ` Marc Dionne
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Dionne @ 2014-06-19  0:41 UTC (permalink / raw)
  To: Waiman Long; +Cc: Chris Mason, Josef Bacik, linux-btrfs, t-itoh

On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote:
> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>
>> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com>  wrote:
>>>
>>> On 06/18/2014 07:30 PM, Waiman Long wrote:
>>>>
>>>> On 06/18/2014 07:27 PM, Chris Mason wrote:
>>>>>
>>>>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>>>>>
>>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>>>>
>>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>>>>>
>>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>>>>
>>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>>>>>
>>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>>>>> similar
>>>>>>>>>>> to what is reported here:
>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> , along with the
>>>>>>>>>>> occasional hard lockup, making it impossible to complete a
>>>>>>>>>>> parallel
>>>>>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>>>>>> working fine just a few days before rc1.
>>>>>>>>>>>
>>>>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>>>>
>>>>>>>>>>>      commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>>>>      Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>>>>      Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>>>>
>>>>>>>>>>>          x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>>>>
>>>>>>>>>>> And sure enough if I revert that commit on top of current
>>>>>>>>>>> mainline,
>>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>>>>
>>>>>>>>>>> Marc
>>>>>>>>>>
>>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>>>>> allowed unless the task is in an interrupt context. Doing
>>>>>>>>>> recursive
>>>>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>>>>> somewhere in
>>>>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>>>>
>>>>>>>>> We walk down a tree and read lock each node as we walk down, is
>>>>>>>>> that
>>>>>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>>>>>> lock in the same process, cause we definitely don't do that.
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>
>>>>>>>> I meant recursively read_lock the same lock in a process.
>>>>>>>
>>>>>>> I take it back, we do actually do this in some cases.  Thanks,
>>>>>>>
>>>>>>> Josef
>>>>>>
>>>>>> This is what I thought when I looked at the looking code in btrfs. The
>>>>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>>>>> lock_nested to be set incorrectly.
>>>>>>
>>>>>> Anyway, are you going to do something about it?
>>>>>
>>>>> Thanks for reporting this, we shouldn't be actually taking the lock
>>>>> recursively.  Could you please try with lockdep enabled?  If the
>>>>> problem
>>>>> goes away with lockdep on, I think I know what's causing it.
>>>>> Otherwise,
>>>>> lockdep should clue us in.
>>>>>
>>>>> -chris
>>>>
>>>> I am not sure if lockdep will report recursive read_lock as this is
>>>> possible in the past. If not, we certainly need to add that capability
>>>> to it.
>>>>
>>>> One more thing, I saw comment in btrfs tree locking code about taking a
>>>> read lock after taking a write (partial?) lock. That is not possible
>>>> with even with the old rwlock code.
>>>
>>> With lockdep on, the clear_path_blocking function you're hitting
>>> softlockups in is different.  Futjitsu hit a similar problem during
>>> quota rescans, and it goes away with lockdep on.  I'm trying to nail
>>> down where we went wrong, but please try lockdep on.
>>>
>>> -chris
>>
>> With lockdep on I'm unable to reproduce the lockups, and there are no
>> lockdep warnings.
>>
>> Marc
>
>
> Enabling lockdep may change the lock timing that make it hard to reproduce
> the problem. Anyway, could you try to apply the following patch to see if it
> shows any warning?
>
> -Longman
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index d24e433..b6c9f2e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct
> held_loc
>                 if (hlock_class(prev) != hlock_class(next))
>                         continue;
>
> +#ifdef CONFIG_QUEUE_RWLOCK
> +               /*
> +                * Queue rwlock only allows read-after-read recursion of the
> +                * same lock class when the latter read is in an interrupt
> +                * context.
> +                */
> +               if ((read == 2) && prev->read && in_interrupt())
> +                       return 2;
> +#else
>                 /*
>                  * Allow read-after-read recursion of the same
>                  * lock class (i.e. read_lock(lock)+read_lock(lock)):
>                  */
>                 if ((read == 2) && prev->read)
>                         return 2;
> +#endif
>
>                 /*
>                  * We're holding the nest_lock, which serializes this lock's
> @@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct
> held_lock
>          * write-lock never takes any other locks, then the reads are
>          * equivalent to a NOP.
>          */
> +#ifndef CONFIG_QUEUE_RWLOCK
>         if (next->read == 2 || prev->read == 2)
>                 return 1;
> +#endif
>         /*
>          * Is the <prev> -> <next> dependency already present?
>          *

I still don't see any warnings with this patch added.  Also tried
along with removing a couple of ifdefs on CONFIG_DEBUG_LOCK_ALLOC in
btrfs/ctree.c - still unable to generate any warnings or lockups.

Marc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  0:41                     ` Marc Dionne
@ 2014-06-19  2:03                       ` Marc Dionne
  2014-06-19  2:11                         ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Dionne @ 2014-06-19  2:03 UTC (permalink / raw)
  To: Waiman Long; +Cc: Chris Mason, Josef Bacik, linux-btrfs, t-itoh

On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote:
>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>
>>> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com>  wrote:
>>>>
>>>> On 06/18/2014 07:30 PM, Waiman Long wrote:
>>>>>
>>>>> On 06/18/2014 07:27 PM, Chris Mason wrote:
>>>>>>
>>>>>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>>>>>>
>>>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>>>>>
>>>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>>>>>>
>>>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>>>>>
>>>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>>>>>> similar
>>>>>>>>>>>> to what is reported here:
>>>>>>>>>>>>
>>>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> , along with the
>>>>>>>>>>>> occasional hard lockup, making it impossible to complete a
>>>>>>>>>>>> parallel
>>>>>>>>>>>> build on a btrfs filesystem for the package I work on.  This was
>>>>>>>>>>>> working fine just a few days before rc1.
>>>>>>>>>>>>
>>>>>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>>>>>
>>>>>>>>>>>>      commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>>>>>      Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>>>>>      Date:   Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>>>>>
>>>>>>>>>>>>          x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>>>>>
>>>>>>>>>>>> And sure enough if I revert that commit on top of current
>>>>>>>>>>>> mainline,
>>>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>>>>>
>>>>>>>>>>>> Marc
>>>>>>>>>>>
>>>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>>>>>> allowed unless the task is in an interrupt context. Doing
>>>>>>>>>>> recursive
>>>>>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>>>>>> somewhere in
>>>>>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>>>>>
>>>>>>>>>> We walk down a tree and read lock each node as we walk down, is
>>>>>>>>>> that
>>>>>>>>>> what you mean?  Or do you mean read_lock multiple times on the same
>>>>>>>>>> lock in the same process, cause we definitely don't do that.
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>> I meant recursively read_lock the same lock in a process.
>>>>>>>>
>>>>>>>> I take it back, we do actually do this in some cases.  Thanks,
>>>>>>>>
>>>>>>>> Josef
>>>>>>>
>>>>>>> This is what I thought when I looked at the looking code in btrfs. The
>>>>>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>>>>>> lock_nested to be set incorrectly.
>>>>>>>
>>>>>>> Anyway, are you going to do something about it?
>>>>>>
>>>>>> Thanks for reporting this, we shouldn't be actually taking the lock
>>>>>> recursively.  Could you please try with lockdep enabled?  If the
>>>>>> problem
>>>>>> goes away with lockdep on, I think I know what's causing it.
>>>>>> Otherwise,
>>>>>> lockdep should clue us in.
>>>>>>
>>>>>> -chris
>>>>>
>>>>> I am not sure if lockdep will report recursive read_lock as this is
>>>>> possible in the past. If not, we certainly need to add that capability
>>>>> to it.
>>>>>
>>>>> One more thing, I saw comment in btrfs tree locking code about taking a
>>>>> read lock after taking a write (partial?) lock. That is not possible
>>>>> with even with the old rwlock code.
>>>>
>>>> With lockdep on, the clear_path_blocking function you're hitting
>>>> softlockups in is different.  Futjitsu hit a similar problem during
>>>> quota rescans, and it goes away with lockdep on.  I'm trying to nail
>>>> down where we went wrong, but please try lockdep on.
>>>>
>>>> -chris
>>>
>>> With lockdep on I'm unable to reproduce the lockups, and there are no
>>> lockdep warnings.
>>>
>>> Marc
>>
>>
>> Enabling lockdep may change the lock timing that make it hard to reproduce
>> the problem. Anyway, could you try to apply the following patch to see if it
>> shows any warning?
>>
>> -Longman
>>
>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>> index d24e433..b6c9f2e 100644
>> --- a/kernel/locking/lockdep.c
>> +++ b/kernel/locking/lockdep.c
>> @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct
>> held_loc
>>                 if (hlock_class(prev) != hlock_class(next))
>>                         continue;
>>
>> +#ifdef CONFIG_QUEUE_RWLOCK
>> +               /*
>> +                * Queue rwlock only allows read-after-read recursion of the
>> +                * same lock class when the latter read is in an interrupt
>> +                * context.
>> +                */
>> +               if ((read == 2) && prev->read && in_interrupt())
>> +                       return 2;
>> +#else
>>                 /*
>>                  * Allow read-after-read recursion of the same
>>                  * lock class (i.e. read_lock(lock)+read_lock(lock)):
>>                  */
>>                 if ((read == 2) && prev->read)
>>                         return 2;
>> +#endif
>>
>>                 /*
>>                  * We're holding the nest_lock, which serializes this lock's
>> @@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct
>> held_lock
>>          * write-lock never takes any other locks, then the reads are
>>          * equivalent to a NOP.
>>          */
>> +#ifndef CONFIG_QUEUE_RWLOCK
>>         if (next->read == 2 || prev->read == 2)
>>                 return 1;
>> +#endif
>>         /*
>>          * Is the <prev> -> <next> dependency already present?
>>          *
>
> I still don't see any warnings with this patch added.  Also tried
> along with removing a couple of ifdefs on CONFIG_DEBUG_LOCK_ALLOC in
> btrfs/ctree.c - still unable to generate any warnings or lockups.
>
> Marc

And for an additional data point, just removing those
CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
the symptoms when lockdep is not enabled.

Marc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  2:03                       ` Marc Dionne
@ 2014-06-19  2:11                         ` Chris Mason
  2014-06-19  3:21                           ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-19  2:11 UTC (permalink / raw)
  To: Marc Dionne, Waiman Long; +Cc: Josef Bacik, linux-btrfs, t-itoh

On 06/18/2014 10:03 PM, Marc Dionne wrote:
> On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote:
>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
> 
> And for an additional data point, just removing those
> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
> the symptoms when lockdep is not enabled.

Ok, somehow we've added a lock inversion here that wasn't here before.
Thanks for confirming, I'll nail it down.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  2:11                         ` Chris Mason
@ 2014-06-19  3:21                           ` Waiman Long
  2014-06-19 16:51                             ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-19  3:21 UTC (permalink / raw)
  To: Chris Mason; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/18/2014 10:11 PM, Chris Mason wrote:
> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>> On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne<marc.c.dionne@gmail.com>  wrote:
>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>  wrote:
>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>> And for an additional data point, just removing those
>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>> the symptoms when lockdep is not enabled.
> Ok, somehow we've added a lock inversion here that wasn't here before.
> Thanks for confirming, I'll nail it down.
>
> -chris
>

I am pretty sure that the hangup is caused by the following kind of code 
fragment in the locking.c file:

  if (eb->lock_nested) {
                 read_lock(&eb->lock);
                 if (eb->lock_nested && current->pid == eb->lock_owner) {

Is it possible to do the check without taking the read_lock?

-Longman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* btrfs-transacti:516 blocked 120 seconds on 3.16-rc1
  2014-06-18 20:57 Lockups with btrfs on 3.16-rc1 - bisected Marc Dionne
  2014-06-18 22:17 ` Waiman Long
@ 2014-06-19  9:49 ` Konstantinos Skarlatos
  1 sibling, 0 replies; 23+ messages in thread
From: Konstantinos Skarlatos @ 2014-06-19  9:49 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

I am not sure this is related with the other reports for lockups etc on 
3.16-rc1, so i am sending it. full dmesg attached. this is after some 
heavy io on a multi disk btrfs filesystem.

[69932.966704] INFO: task btrfs-transacti:516 blocked for more than 120 
seconds.
[69932.966837]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[69932.966921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[69932.967051] btrfs-transacti D 0000000000000001     0   516      2 
0x00000000
[69932.967060]  ffff8801f422fac0 0000000000000046 ffff880203f3bd20 
00000000000145c0
[69932.967069]  ffff8801f422ffd8 00000000000145c0 ffff880203f3bd20 
ffff8801f422fa30
[69932.967076]  ffffffffa062e392 ffff8800cda63300 ffff8802010b1e60 
00000c73d1920000
[69932.967083] Call Trace:
[69932.967133]  [<ffffffffa062e392>] ? add_delayed_tree_ref+0x102/0x1b0 
[btrfs]
[69932.967146]  [<ffffffff8119937a>] ? kmem_cache_alloc_trace+0x1fa/0x220
[69932.967155]  [<ffffffff814fd759>] schedule+0x29/0x70
[69932.967179]  [<ffffffffa05c8571>] cache_block_group+0x121/0x390 [btrfs]
[69932.967187]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[69932.967212]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[69932.967243]  [<ffffffffa0606f00>] ? free_extent_buffer+0x10/0xa0 [btrfs]
[69932.967269]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[69932.967298]  [<ffffffffa05ed388>] 
__btrfs_prealloc_file_range+0xe8/0x380 [btrfs]
[69932.967328]  [<ffffffffa05f52b0>] 
btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs]
[69932.967353]  [<ffffffffa05d4a97>] 
btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs]
[69932.967380]  [<ffffffffa05e2b5d>] commit_cowonly_roots+0x18d/0x240 
[btrfs]
[69932.967408]  [<ffffffffa05e4c87>] 
btrfs_commit_transaction+0x4f7/0xa40 [btrfs]
[69932.967435]  [<ffffffffa05e0835>] transaction_kthread+0x1e5/0x250 [btrfs]
[69932.967462]  [<ffffffffa05e0650>] ? 
btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[69932.967471]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[69932.967478]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.967486]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[69932.967493]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.967505] INFO: task kworker/u16:15:30882 blocked for more than 120 
seconds.

-- 
Konstantinos Skarlatos


[-- Attachment #2: 3.16_rc1-blocked120seconds.txt --]
[-- Type: text/plain, Size: 25695 bytes --]

[  995.654816] BTRFS info (device sdh): force zlib compression
[  995.654827] BTRFS info (device sdh): disk space caching is enabled
[  995.654832] BTRFS: has skinny extents
[  995.785405] BTRFS: bdev /dev/sda errs: wr 0, rd 0, flush 0, corrupt 0, gen 2
[69932.966704] INFO: task btrfs-transacti:516 blocked for more than 120 seconds.
[69932.966837]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[69932.966921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[69932.967051] btrfs-transacti D 0000000000000001     0   516      2 0x00000000
[69932.967060]  ffff8801f422fac0 0000000000000046 ffff880203f3bd20 00000000000145c0
[69932.967069]  ffff8801f422ffd8 00000000000145c0 ffff880203f3bd20 ffff8801f422fa30
[69932.967076]  ffffffffa062e392 ffff8800cda63300 ffff8802010b1e60 00000c73d1920000
[69932.967083] Call Trace:
[69932.967133]  [<ffffffffa062e392>] ? add_delayed_tree_ref+0x102/0x1b0 [btrfs]
[69932.967146]  [<ffffffff8119937a>] ? kmem_cache_alloc_trace+0x1fa/0x220
[69932.967155]  [<ffffffff814fd759>] schedule+0x29/0x70
[69932.967179]  [<ffffffffa05c8571>] cache_block_group+0x121/0x390 [btrfs]
[69932.967187]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[69932.967212]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[69932.967243]  [<ffffffffa0606f00>] ? free_extent_buffer+0x10/0xa0 [btrfs]
[69932.967269]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[69932.967298]  [<ffffffffa05ed388>] __btrfs_prealloc_file_range+0xe8/0x380 [btrfs]
[69932.967328]  [<ffffffffa05f52b0>] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs]
[69932.967353]  [<ffffffffa05d4a97>] btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs]
[69932.967380]  [<ffffffffa05e2b5d>] commit_cowonly_roots+0x18d/0x240 [btrfs]
[69932.967408]  [<ffffffffa05e4c87>] btrfs_commit_transaction+0x4f7/0xa40 [btrfs]
[69932.967435]  [<ffffffffa05e0835>] transaction_kthread+0x1e5/0x250 [btrfs]
[69932.967462]  [<ffffffffa05e0650>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[69932.967471]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[69932.967478]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.967486]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[69932.967493]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.967505] INFO: task kworker/u16:15:30882 blocked for more than 120 seconds.
[69932.967625]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[69932.967707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[69932.967835] kworker/u16:15  D 0000000000000000     0 30882      2 0x00000000
[69932.967867] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[69932.967871]  ffff88003e537858 0000000000000046 ffff8801fc599e90 00000000000145c0
[69932.967878]  ffff88003e537fd8 00000000000145c0 ffff8801fc599e90 0000000000000000
[69932.967884]  0000000000000000 0000000000000000 ffff8802036bd968 ffff8802036bd848
[69932.967890] Call Trace:
[69932.967900]  [<ffffffff81263923>] ? __blk_run_queue+0x33/0x40
[69932.967908]  [<ffffffff81264bbb>] ? queue_unplugged+0x3b/0xd0
[69932.967916]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[69932.967924]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[69932.967930]  [<ffffffff814fd759>] schedule+0x29/0x70
[69932.967936]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[69932.967943]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[69932.967949]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[69932.967956]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[69932.967963]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[69932.967971]  [<ffffffff8114212c>] pagecache_get_page+0xac/0x1d0
[69932.968000]  [<ffffffffa0626887>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[69932.968030]  [<ffffffffa06298dd>] __load_free_space_cache+0x1bd/0x680 [btrfs]
[69932.968059]  [<ffffffffa0629e9c>] load_free_space_cache+0xfc/0x1c0 [btrfs]
[69932.968081]  [<ffffffffa05c85e2>] cache_block_group+0x192/0x390 [btrfs]
[69932.968088]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[69932.968112]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[69932.968121]  [<ffffffff8119a301>] ? kmem_cache_free+0x181/0x240
[69932.968145]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[69932.968173]  [<ffffffffa05eb563>] cow_file_range+0x123/0x400 [btrfs]
[69932.968202]  [<ffffffffa05ec855>] submit_compressed_extents+0x1f5/0x460 [btrfs]
[69932.968231]  [<ffffffffa05ecac0>] ? submit_compressed_extents+0x460/0x460 [btrfs]
[69932.968260]  [<ffffffffa05ecb46>] async_cow_submit+0x86/0x90 [btrfs]
[69932.968289]  [<ffffffffa0614735>] normal_work_helper+0x205/0x350 [btrfs]
[69932.968297]  [<ffffffff81085fc8>] process_one_work+0x168/0x450
[69932.968305]  [<ffffffff810865eb>] worker_thread+0x6b/0x550
[69932.968313]  [<ffffffff81086580>] ? init_pwq.part.22+0x10/0x10
[69932.968320]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[69932.968327]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.968334]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[69932.968341]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.968348] INFO: task rsync:30889 blocked for more than 120 seconds.
[69932.968455]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[69932.968537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[69932.968665] rsync           D 0000000000000001     0 30889  30887 0x00000000
[69932.968671]  ffff88009423bb58 0000000000000082 ffff8801cb03bd20 00000000000145c0
[69932.968677]  ffff88009423bfd8 00000000000145c0 ffff8801cb03bd20 ffff88009423bab0
[69932.968683]  ffffffffa0614ce8 ffff88020286e000 0000000000000001 000000000057afff
[69932.968689] Call Trace:
[69932.968719]  [<ffffffffa0614ce8>] ? btrfs_queue_work+0x88/0xf0 [btrfs]
[69932.968748]  [<ffffffffa05ec4a2>] ? run_delalloc_range+0x182/0x340 [btrfs]
[69932.968756]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[69932.968763]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[69932.968769]  [<ffffffff814fd759>] schedule+0x29/0x70
[69932.968775]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[69932.968782]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[69932.968788]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[69932.968795]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[69932.968802]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[69932.968831]  [<ffffffffa060036e>] ? flush_write_bio+0xe/0x10 [btrfs]
[69932.968860]  [<ffffffffa0604268>] extent_write_cache_pages.isra.29.constprop.46+0x2b8/0x3f0 [btrfs]
[69932.968870]  [<ffffffff81151d4d>] ? truncate_inode_pages_range+0x29d/0x740
[69932.968900]  [<ffffffffa060641d>] extent_writepages+0x4d/0x70 [btrfs]
[69932.968928]  [<ffffffffa05e8af0>] ? btrfs_direct_IO+0x360/0x360 [btrfs]
[69932.968956]  [<ffffffffa05e78d8>] btrfs_writepages+0x28/0x30 [btrfs]
[69932.968964]  [<ffffffff8114e92e>] do_writepages+0x1e/0x30
[69932.968972]  [<ffffffff81142d89>] __filemap_fdatawrite_range+0x59/0x60
[69932.968980]  [<ffffffff81142e53>] filemap_fdatawrite_range+0x13/0x20
[69932.969010]  [<ffffffffa05ff37f>] btrfs_wait_ordered_range+0xff/0x150 [btrfs]
[69932.969038]  [<ffffffffa05eee6a>] btrfs_truncate+0x4a/0x330 [btrfs]
[69932.969046]  [<ffffffff811522ca>] ? truncate_pagecache+0x5a/0x70
[69932.969074]  [<ffffffffa05efb98>] btrfs_setattr+0x228/0x2e0 [btrfs]
[69932.969083]  [<ffffffff811d1fb1>] notify_change+0x221/0x380
[69932.969091]  [<ffffffff811b4846>] do_truncate+0x66/0x90
[69932.969097]  [<ffffffff811b8d39>] ? __sb_start_write+0x49/0xf0
[69932.969105]  [<ffffffff811b4bbb>] do_sys_ftruncate.constprop.10+0x10b/0x160
[69932.969112]  [<ffffffff811b4c4e>] SyS_ftruncate+0xe/0x10
[69932.969119]  [<ffffffff81501429>] system_call_fastpath+0x16/0x1b
[69932.969126] INFO: task kworker/u16:20:31689 blocked for more than 120 seconds.
[69932.969245]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[69932.969326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[69932.969454] kworker/u16:20  D 0000000000000000     0 31689      2 0x00000000
[69932.969464] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
[69932.969468]  ffff88009d0179c8 0000000000000046 ffff8802034bc750 00000000000145c0
[69932.969474]  ffff88009d017fd8 00000000000145c0 ffff8802034bc750 ffff8802035cb1c0
[69932.969480]  ffff88009d017928 ffffffffa0614ce8 ffff88020286e000 000000000000003c
[69932.969486] Call Trace:
[69932.969516]  [<ffffffffa0614ce8>] ? btrfs_queue_work+0x88/0xf0 [btrfs]
[69932.969545]  [<ffffffffa05ec4a2>] ? run_delalloc_range+0x182/0x340 [btrfs]
[69932.969552]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[69932.969560]  [<ffffffff810d37a8>] ? ktime_get_ts+0x48/0xf0
[69932.969567]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[69932.969574]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[69932.969580]  [<ffffffff814fd759>] schedule+0x29/0x70
[69932.969585]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[69932.969592]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[69932.969598]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[69932.969605]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[69932.969612]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[69932.969619]  [<ffffffff8114fad1>] ? pagevec_lookup_tag+0x21/0x30
[69932.969648]  [<ffffffffa060036e>] ? flush_write_bio+0xe/0x10 [btrfs]
[69932.969678]  [<ffffffffa0604268>] extent_write_cache_pages.isra.29.constprop.46+0x2b8/0x3f0 [btrfs]
[69932.969709]  [<ffffffffa060641d>] extent_writepages+0x4d/0x70 [btrfs]
[69932.969737]  [<ffffffffa05e8af0>] ? btrfs_direct_IO+0x360/0x360 [btrfs]
[69932.969764]  [<ffffffffa05e78d8>] btrfs_writepages+0x28/0x30 [btrfs]
[69932.969771]  [<ffffffff8114e92e>] do_writepages+0x1e/0x30
[69932.969778]  [<ffffffff811dfa70>] __writeback_single_inode+0x40/0x2b0
[69932.969785]  [<ffffffff811e0e97>] writeback_sb_inodes+0x247/0x400
[69932.969792]  [<ffffffff811e10ef>] __writeback_inodes_wb+0x9f/0xd0
[69932.969798]  [<ffffffff811e132b>] wb_writeback+0x20b/0x330
[69932.969805]  [<ffffffff811e18c4>] bdi_writeback_workfn+0x314/0x490
[69932.969814]  [<ffffffff81085fc8>] process_one_work+0x168/0x450
[69932.969821]  [<ffffffff810865eb>] worker_thread+0x6b/0x550
[69932.969829]  [<ffffffff81086580>] ? init_pwq.part.22+0x10/0x10
[69932.969836]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[69932.969843]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[69932.969850]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[69932.969857]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.125951] INFO: task btrfs-transacti:516 blocked for more than 120 seconds.
[70053.126012]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70053.126069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70053.126123] btrfs-transacti D 0000000000000001     0   516      2 0x00000000
[70053.126127]  ffff8801f422fac0 0000000000000046 ffff880203f3bd20 00000000000145c0
[70053.126130]  ffff8801f422ffd8 00000000000145c0 ffff880203f3bd20 ffff8801f422fa30
[70053.126138]  ffffffffa062e392 ffff8800cda63300 ffff8802010b1e60 00000c73d1920000
[70053.126141] Call Trace:
[70053.126171]  [<ffffffffa062e392>] ? add_delayed_tree_ref+0x102/0x1b0 [btrfs]
[70053.126179]  [<ffffffff8119937a>] ? kmem_cache_alloc_trace+0x1fa/0x220
[70053.126183]  [<ffffffff814fd759>] schedule+0x29/0x70
[70053.126192]  [<ffffffffa05c8571>] cache_block_group+0x121/0x390 [btrfs]
[70053.126196]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[70053.126205]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[70053.126217]  [<ffffffffa0606f00>] ? free_extent_buffer+0x10/0xa0 [btrfs]
[70053.126227]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[70053.126238]  [<ffffffffa05ed388>] __btrfs_prealloc_file_range+0xe8/0x380 [btrfs]
[70053.126249]  [<ffffffffa05f52b0>] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs]
[70053.126259]  [<ffffffffa05d4a97>] btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs]
[70053.126269]  [<ffffffffa05e2b5d>] commit_cowonly_roots+0x18d/0x240 [btrfs]
[70053.126280]  [<ffffffffa05e4c87>] btrfs_commit_transaction+0x4f7/0xa40 [btrfs]
[70053.126290]  [<ffffffffa05e0835>] transaction_kthread+0x1e5/0x250 [btrfs]
[70053.126301]  [<ffffffffa05e0650>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[70053.126304]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[70053.126307]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.126310]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[70053.126313]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.126318] INFO: task kworker/u16:15:30882 blocked for more than 120 seconds.
[70053.126364]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70053.126396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70053.126444] kworker/u16:15  D 0000000000000000     0 30882      2 0x00000000
[70053.126457] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[70053.126459]  ffff88003e537858 0000000000000046 ffff8801fc599e90 00000000000145c0
[70053.126471]  ffff88003e537fd8 00000000000145c0 ffff8801fc599e90 0000000000000000
[70053.126474]  0000000000000000 0000000000000000 ffff8802036bd968 ffff8802036bd848
[70053.126476] Call Trace:
[70053.126480]  [<ffffffff81263923>] ? __blk_run_queue+0x33/0x40
[70053.126489]  [<ffffffff81264bbb>] ? queue_unplugged+0x3b/0xd0
[70053.126492]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[70053.126496]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[70053.126500]  [<ffffffff814fd759>] schedule+0x29/0x70
[70053.126503]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[70053.126505]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[70053.126508]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[70053.126511]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[70053.126513]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[70053.126516]  [<ffffffff8114212c>] pagecache_get_page+0xac/0x1d0
[70053.126527]  [<ffffffffa0626887>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[70053.126539]  [<ffffffffa06298dd>] __load_free_space_cache+0x1bd/0x680 [btrfs]
[70053.126550]  [<ffffffffa0629e9c>] load_free_space_cache+0xfc/0x1c0 [btrfs]
[70053.126558]  [<ffffffffa05c85e2>] cache_block_group+0x192/0x390 [btrfs]
[70053.126561]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[70053.126570]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[70053.126573]  [<ffffffff8119a301>] ? kmem_cache_free+0x181/0x240
[70053.126583]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[70053.126593]  [<ffffffffa05eb563>] cow_file_range+0x123/0x400 [btrfs]
[70053.126604]  [<ffffffffa05ec855>] submit_compressed_extents+0x1f5/0x460 [btrfs]
[70053.126616]  [<ffffffffa05ecac0>] ? submit_compressed_extents+0x460/0x460 [btrfs]
[70053.126626]  [<ffffffffa05ecb46>] async_cow_submit+0x86/0x90 [btrfs]
[70053.126637]  [<ffffffffa0614735>] normal_work_helper+0x205/0x350 [btrfs]
[70053.126641]  [<ffffffff81085fc8>] process_one_work+0x168/0x450
[70053.126644]  [<ffffffff810865eb>] worker_thread+0x6b/0x550
[70053.126647]  [<ffffffff81086580>] ? init_pwq.part.22+0x10/0x10
[70053.126649]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[70053.126652]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.126655]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[70053.126657]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.126660] INFO: task rsync:30889 blocked for more than 120 seconds.
[70053.126701]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70053.126732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70053.126781] rsync           D 0000000000000001     0 30889  30887 0x00000000
[70053.126783]  ffff88009423bb58 0000000000000082 ffff8801cb03bd20 00000000000145c0
[70053.126786]  ffff88009423bfd8 00000000000145c0 ffff8801cb03bd20 ffff88009423bab0
[70053.126788]  ffffffffa0614ce8 ffff88020286e000 0000000000000001 000000000057afff
[70053.126790] Call Trace:
[70053.126802]  [<ffffffffa0614ce8>] ? btrfs_queue_work+0x88/0xf0 [btrfs]
[70053.126813]  [<ffffffffa05ec4a2>] ? run_delalloc_range+0x182/0x340 [btrfs]
[70053.126816]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[70053.126819]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[70053.126821]  [<ffffffff814fd759>] schedule+0x29/0x70
[70053.126823]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[70053.126826]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[70053.126828]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[70053.126831]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[70053.126833]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[70053.126844]  [<ffffffffa060036e>] ? flush_write_bio+0xe/0x10 [btrfs]
[70053.126856]  [<ffffffffa0604268>] extent_write_cache_pages.isra.29.constprop.46+0x2b8/0x3f0 [btrfs]
[70053.126859]  [<ffffffff81151d4d>] ? truncate_inode_pages_range+0x29d/0x740
[70053.126871]  [<ffffffffa060641d>] extent_writepages+0x4d/0x70 [btrfs]
[70053.126882]  [<ffffffffa05e8af0>] ? btrfs_direct_IO+0x360/0x360 [btrfs]
[70053.126893]  [<ffffffffa05e78d8>] btrfs_writepages+0x28/0x30 [btrfs]
[70053.126895]  [<ffffffff8114e92e>] do_writepages+0x1e/0x30
[70053.126899]  [<ffffffff81142d89>] __filemap_fdatawrite_range+0x59/0x60
[70053.126902]  [<ffffffff81142e53>] filemap_fdatawrite_range+0x13/0x20
[70053.126913]  [<ffffffffa05ff37f>] btrfs_wait_ordered_range+0xff/0x150 [btrfs]
[70053.126924]  [<ffffffffa05eee6a>] btrfs_truncate+0x4a/0x330 [btrfs]
[70053.126927]  [<ffffffff811522ca>] ? truncate_pagecache+0x5a/0x70
[70053.126938]  [<ffffffffa05efb98>] btrfs_setattr+0x228/0x2e0 [btrfs]
[70053.126941]  [<ffffffff811d1fb1>] notify_change+0x221/0x380
[70053.126945]  [<ffffffff811b4846>] do_truncate+0x66/0x90
[70053.126947]  [<ffffffff811b8d39>] ? __sb_start_write+0x49/0xf0
[70053.126950]  [<ffffffff811b4bbb>] do_sys_ftruncate.constprop.10+0x10b/0x160
[70053.126953]  [<ffffffff811b4c4e>] SyS_ftruncate+0xe/0x10
[70053.126955]  [<ffffffff81501429>] system_call_fastpath+0x16/0x1b
[70053.126958] INFO: task kworker/u16:20:31689 blocked for more than 120 seconds.
[70053.127004]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70053.127035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70053.127083] kworker/u16:20  D 0000000000000000     0 31689      2 0x00000000
[70053.127088] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
[70053.127090]  ffff88009d0179c8 0000000000000046 ffff8802034bc750 00000000000145c0
[70053.127092]  ffff88009d017fd8 00000000000145c0 ffff8802034bc750 ffff8802035cb1c0
[70053.127094]  ffff88009d017928 ffffffffa0614ce8 ffff88020286e000 000000000000003c
[70053.127097] Call Trace:
[70053.127108]  [<ffffffffa0614ce8>] ? btrfs_queue_work+0x88/0xf0 [btrfs]
[70053.127119]  [<ffffffffa05ec4a2>] ? run_delalloc_range+0x182/0x340 [btrfs]
[70053.127121]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[70053.127125]  [<ffffffff810d37a8>] ? ktime_get_ts+0x48/0xf0
[70053.127127]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[70053.127130]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[70053.127132]  [<ffffffff814fd759>] schedule+0x29/0x70
[70053.127134]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[70053.127137]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[70053.127139]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[70053.127142]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[70053.127144]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[70053.127147]  [<ffffffff8114fad1>] ? pagevec_lookup_tag+0x21/0x30
[70053.127158]  [<ffffffffa060036e>] ? flush_write_bio+0xe/0x10 [btrfs]
[70053.127170]  [<ffffffffa0604268>] extent_write_cache_pages.isra.29.constprop.46+0x2b8/0x3f0 [btrfs]
[70053.127182]  [<ffffffffa060641d>] extent_writepages+0x4d/0x70 [btrfs]
[70053.127192]  [<ffffffffa05e8af0>] ? btrfs_direct_IO+0x360/0x360 [btrfs]
[70053.127203]  [<ffffffffa05e78d8>] btrfs_writepages+0x28/0x30 [btrfs]
[70053.127205]  [<ffffffff8114e92e>] do_writepages+0x1e/0x30
[70053.127208]  [<ffffffff811dfa70>] __writeback_single_inode+0x40/0x2b0
[70053.127210]  [<ffffffff811e0e97>] writeback_sb_inodes+0x247/0x400
[70053.127213]  [<ffffffff811e10ef>] __writeback_inodes_wb+0x9f/0xd0
[70053.127216]  [<ffffffff811e132b>] wb_writeback+0x20b/0x330
[70053.127218]  [<ffffffff811e18c4>] bdi_writeback_workfn+0x314/0x490
[70053.127222]  [<ffffffff81085fc8>] process_one_work+0x168/0x450
[70053.127224]  [<ffffffff810865eb>] worker_thread+0x6b/0x550
[70053.127227]  [<ffffffff81086580>] ? init_pwq.part.22+0x10/0x10
[70053.127230]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[70053.127233]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70053.127235]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[70053.127238]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70173.288526] INFO: task btrfs-transacti:516 blocked for more than 120 seconds.
[70173.288580]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70173.288612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70173.288662] btrfs-transacti D 0000000000000001     0   516      2 0x00000000
[70173.288667]  ffff8801f422fac0 0000000000000046 ffff880203f3bd20 00000000000145c0
[70173.288670]  ffff8801f422ffd8 00000000000145c0 ffff880203f3bd20 ffff8801f422fa30
[70173.288673]  ffffffffa062e392 ffff8800cda63300 ffff8802010b1e60 00000c73d1920000
[70173.288676] Call Trace:
[70173.288701]  [<ffffffffa062e392>] ? add_delayed_tree_ref+0x102/0x1b0 [btrfs]
[70173.288707]  [<ffffffff8119937a>] ? kmem_cache_alloc_trace+0x1fa/0x220
[70173.288711]  [<ffffffff814fd759>] schedule+0x29/0x70
[70173.288720]  [<ffffffffa05c8571>] cache_block_group+0x121/0x390 [btrfs]
[70173.288724]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[70173.288733]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[70173.288745]  [<ffffffffa0606f00>] ? free_extent_buffer+0x10/0xa0 [btrfs]
[70173.288755]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[70173.288766]  [<ffffffffa05ed388>] __btrfs_prealloc_file_range+0xe8/0x380 [btrfs]
[70173.288777]  [<ffffffffa05f52b0>] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs]
[70173.288787]  [<ffffffffa05d4a97>] btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs]
[70173.288798]  [<ffffffffa05e2b5d>] commit_cowonly_roots+0x18d/0x240 [btrfs]
[70173.288808]  [<ffffffffa05e4c87>] btrfs_commit_transaction+0x4f7/0xa40 [btrfs]
[70173.288819]  [<ffffffffa05e0835>] transaction_kthread+0x1e5/0x250 [btrfs]
[70173.288829]  [<ffffffffa05e0650>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[70173.288833]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[70173.288836]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70173.288839]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[70173.288842]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70173.288847] INFO: task kworker/u16:15:30882 blocked for more than 120 seconds.
[70173.288893]       Not tainted 3.16.0-rc1-ge99cfa2 #1
[70173.288924] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[70173.289004] kworker/u16:15  D 0000000000000000     0 30882      2 0x00000000
[70173.289017] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[70173.289019]  ffff88003e537858 0000000000000046 ffff8801fc599e90 00000000000145c0
[70173.289022]  ffff88003e537fd8 00000000000145c0 ffff8801fc599e90 0000000000000000
[70173.289024]  0000000000000000 0000000000000000 ffff8802036bd968 ffff8802036bd848
[70173.289027] Call Trace:
[70173.289031]  [<ffffffff81263923>] ? __blk_run_queue+0x33/0x40
[70173.289034]  [<ffffffff81264bbb>] ? queue_unplugged+0x3b/0xd0
[70173.289038]  [<ffffffff8101dda9>] ? read_tsc+0x9/0x20
[70173.289042]  [<ffffffff81141410>] ? filemap_fdatawait+0x30/0x30
[70173.289044]  [<ffffffff814fd759>] schedule+0x29/0x70
[70173.289046]  [<ffffffff814fda34>] io_schedule+0x94/0xf0
[70173.289049]  [<ffffffff8114141e>] sleep_on_page+0xe/0x20
[70173.289051]  [<ffffffff814fdf68>] __wait_on_bit_lock+0x48/0xb0
[70173.289054]  [<ffffffff8114156a>] __lock_page+0x6a/0x70
[70173.289057]  [<ffffffff810b09d0>] ? autoremove_wake_function+0x40/0x40
[70173.289060]  [<ffffffff8114212c>] pagecache_get_page+0xac/0x1d0
[70173.289071]  [<ffffffffa0626887>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[70173.289082]  [<ffffffffa06298dd>] __load_free_space_cache+0x1bd/0x680 [btrfs]
[70173.289094]  [<ffffffffa0629e9c>] load_free_space_cache+0xfc/0x1c0 [btrfs]
[70173.289102]  [<ffffffffa05c85e2>] cache_block_group+0x192/0x390 [btrfs]
[70173.289105]  [<ffffffff810b0990>] ? __wake_up_sync+0x20/0x20
[70173.289114]  [<ffffffffa05d16fa>] find_free_extent+0x5fa/0xc80 [btrfs]
[70173.289117]  [<ffffffff8119a301>] ? kmem_cache_free+0x181/0x240
[70173.289127]  [<ffffffffa05d1f52>] btrfs_reserve_extent+0x62/0x140 [btrfs]
[70173.289137]  [<ffffffffa05eb563>] cow_file_range+0x123/0x400 [btrfs]
[70173.289149]  [<ffffffffa05ec855>] submit_compressed_extents+0x1f5/0x460 [btrfs]
[70173.289160]  [<ffffffffa05ecac0>] ? submit_compressed_extents+0x460/0x460 [btrfs]
[70173.289170]  [<ffffffffa05ecb46>] async_cow_submit+0x86/0x90 [btrfs]
[70173.289181]  [<ffffffffa0614735>] normal_work_helper+0x205/0x350 [btrfs]
[70173.289185]  [<ffffffff81085fc8>] process_one_work+0x168/0x450
[70173.289188]  [<ffffffff810865eb>] worker_thread+0x6b/0x550
[70173.289192]  [<ffffffff81086580>] ? init_pwq.part.22+0x10/0x10
[70173.289194]  [<ffffffff8108c97b>] kthread+0xdb/0x100
[70173.289197]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180
[70173.289200]  [<ffffffff8150137c>] ret_from_fork+0x7c/0xb0
[70173.289202]  [<ffffffff8108c8a0>] ? kthread_create_on_node+0x180/0x180

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19  3:21                           ` Waiman Long
@ 2014-06-19 16:51                             ` Chris Mason
  2014-06-19 17:52                               ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-19 16:51 UTC (permalink / raw)
  To: Waiman Long; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/18/2014 11:21 PM, Waiman Long wrote:
> On 06/18/2014 10:11 PM, Chris Mason wrote:
>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>> Dionne<marc.c.dionne@gmail.com>  wrote:
>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com> 
>>>> wrote:
>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>> And for an additional data point, just removing those
>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>> the symptoms when lockdep is not enabled.
>> Ok, somehow we've added a lock inversion here that wasn't here before.
>> Thanks for confirming, I'll nail it down.
>>
>> -chris
>>
> 
> I am pretty sure that the hangup is caused by the following kind of code
> fragment in the locking.c file:
> 
>  if (eb->lock_nested) {
>                 read_lock(&eb->lock);
>                 if (eb->lock_nested && current->pid == eb->lock_owner) {
> 
> Is it possible to do the check without taking the read_lock?

I think you're right, we haven't added any new recursive takers of the
lock.  The path where we are deadlocking has an extent buffer that isn't
in the path yet locked.  I think we're taking the read lock while that
one is write locked.

Reworking the nesting a big here.

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19 16:51                             ` Chris Mason
@ 2014-06-19 17:52                               ` Waiman Long
  2014-06-19 20:10                                 ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-19 17:52 UTC (permalink / raw)
  To: Chris Mason; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/19/2014 12:51 PM, Chris Mason wrote:
> On 06/18/2014 11:21 PM, Waiman Long wrote:
>> On 06/18/2014 10:11 PM, Chris Mason wrote:
>>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>>> Dionne<marc.c.dionne@gmail.com>   wrote:
>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>
>>>>> wrote:
>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>> And for an additional data point, just removing those
>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>>> the symptoms when lockdep is not enabled.
>>> Ok, somehow we've added a lock inversion here that wasn't here before.
>>> Thanks for confirming, I'll nail it down.
>>>
>>> -chris
>>>
>> I am pretty sure that the hangup is caused by the following kind of code
>> fragment in the locking.c file:
>>
>>   if (eb->lock_nested) {
>>                  read_lock(&eb->lock);
>>                  if (eb->lock_nested&&  current->pid == eb->lock_owner) {
>>
>> Is it possible to do the check without taking the read_lock?
> I think you're right, we haven't added any new recursive takers of the
> lock.  The path where we are deadlocking has an extent buffer that isn't
> in the path yet locked.  I think we're taking the read lock while that
> one is write locked.
>
> Reworking the nesting a big here.
>
> -chris

I would like to take back my comments. I took out the read_lock, but the 
process still hang while doing file activities on btrfs filesystem. So 
the problem is trickier than I thought. Below are the stack backtraces 
of some of the relevant processes.

-Longman

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched self-detected 
stall on
  CPU {INFO: rcu_sched self-detected stall on CPU { 10 20}  {}  0}  
(t=21000 jiff
ies g=4633 c=4632 q=8579)
  (t=21000 jiffies g=4633 c=4632 q=8579)
sending NMI to all CPUs:
  (t=21000 jiffies g=4633 c=4632 q=8579)
NMI backtrace for cpu 0
CPU: 0 PID: 559 Comm: kworker/u65:8 Tainted: G            E 3.16.0-rc1 #3
Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
Workqueue: btrfs-endio-write normal_work_helper
task: ffff88040e986510 ti: ffff88040e98c000 task.ti: ffff88040e98c000
RIP: 0010:[<ffffffff810a687d>]  [<ffffffff810a687d>] 
do_raw_spin_unlock+0x3d/0xa
0
RSP: 0018:ffff88041fc03d08  EFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff88041fc13680 RCX: ffff88040e98c010
RDX: 0000000000000acd RSI: 0000000000000001 RDI: ffff88041fc13680
RBP: ffff88041fc03d18 R08: 0000000000000000 R09: 0000000000000006
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88041fc13680
R13: 0000000000000082 R14: 0000000000000000 R15: ffff88041fc0d1a0
FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f40cbd78000 CR3: 0000000001c10000 CR4: 00000000000007f0
Stack:
  0000000000000000 0000000000000082 ffff88041fc03d38 ffffffff8166e7e6
  ffff88041fc13680 0000000000013680 ffff88041fc03d68 ffffffff810834a0
  ffffffff81c4f940 0000000000000086 ffffffff81d57d60 0000000000000000
Call Trace:
<IRQ>  [<ffffffff8166e7e6>] _raw_spin_unlock_irqrestore+0x36/0x50
  [<ffffffff810834a0>] resched_cpu+0x80/0x90
  [<ffffffff810be01f>] print_cpu_stall+0x12f/0x140
  [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0
  [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210
  [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0
  [<ffffffff8105f938>] update_process_times+0x48/0x80
  [<ffffffff810caa77>] tick_sched_handle+0x37/0x80
  [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90
  [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0
  [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0
  [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60
  [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60
  [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70
<EOI>  [<ffffffff810a7180>] ? queue_write_lock_slowpath+0x60/0x90
  [<ffffffff810a692d>] do_raw_write_lock+0x4d/0xa0
  [<ffffffff8166e3b9>] _raw_write_lock+0x39/0x40
  [<ffffffff812948ef>] ? btrfs_try_tree_write_lock+0x4f/0xc0
  [<ffffffff812948ef>] btrfs_try_tree_write_lock+0x4f/0xc0
  [<ffffffff81236e62>] btrfs_search_slot+0x422/0x870
  [<ffffffff81237c7e>] btrfs_insert_empty_items+0x7e/0xe0
  [<ffffffff8125f83c>] insert_reserved_file_extent.clone.0+0x13c/0x2f0
  [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560
  [<ffffffff81262825>] finish_ordered_fn+0x15/0x20
  [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0
  [<ffffffff8107052b>] process_one_work+0x1db/0x510
  [<ffffffff810704b6>] ? process_one_work+0x166/0x510
  [<ffffffff810717ef>] worker_thread+0x11f/0x3c0
  [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190
  [<ffffffff8107791e>] kthread+0xde/0x100
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
  [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
Code: 4e ad de 48 89 fb 75 42 0f b7 13 0f b7 43 02 66 39 c2 74 66 65 48 
8b 04 25
  c0 b9 00 00 48 39 43 10 75 46 65 8b 04 25 30 b0 00 00 <39> 43 08 75 28 
48 c7 43
  10 ff ff ff ff c7 43 08 ff ff ff ff 66

--------------------------------------------------------

NMI backtrace for cpu 10
CPU: 10 PID: 23844 Comm: kworker/u65:14 Tainted: G            E 
3.16.0-rc1 #3
Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
Workqueue: btrfs-endio-write normal_work_helper
task: ffff880407c26190 ti: ffff88040799c000 task.ti: ffff88040799c000
RIP: 0010:[<ffffffff8103779c>]  [<ffffffff8103779c>] 
default_send_IPI_mask_seque
nce_phys+0x6c/0xf0
RSP: 0018:ffff88041fca3cf8  EFLAGS: 00000046
RAX: ffff88081faa0000 RBX: 0000000000000002 RCX: 0000000000000032
RDX: 000000000000000b RSI: 0000000000000020 RDI: 0000000000000020
RBP: ffff88041fca3d38 R08: ffffffff81d58160 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000b034
R13: ffffffff81d58160 R14: 0000000000000400 R15: 0000000000000092
FS:  0000000000000000(0000) GS:ffff88041fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f40ca200000 CR3: 0000000001c10000 CR4: 00000000000007e0
Stack:
  0000000000000012 000000000000000a ffff88041fca3d18 0000000000000000
  0000000000002183 ffffffff81d57d60 0000000000000000 ffff88041fcad1a0
  ffff88041fca3d48 ffffffff8103acb7 ffff88041fca3d68 ffffffff81037941
Call Trace:
<IRQ>  [<ffffffff8103acb7>] physflat_send_IPI_all+0x17/0x20
  [<ffffffff81037941>] arch_trigger_all_cpu_backtrace+0x61/0xa0
  [<ffffffff810bdfcc>] print_cpu_stall+0xdc/0x140
  [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0
  [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210
  [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0
  [<ffffffff8105f938>] update_process_times+0x48/0x80
  [<ffffffff810caa77>] tick_sched_handle+0x37/0x80
  [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90
  [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0
  [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0
  [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60
  [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60
  [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70
<EOI>  [<ffffffff810a7223>] ? queue_read_lock_slowpath+0x73/0x90
  [<ffffffff810a69c4>] do_raw_read_lock+0x44/0x50
  [<ffffffff8166e6bc>] _raw_read_lock+0x3c/0x50
  [<ffffffff81294d31>] ? btrfs_clear_lock_blocking_rw+0x71/0x1d0
  [<ffffffff81294d31>] btrfs_clear_lock_blocking_rw+0x71/0x1d0
  [<ffffffff8122d92a>] btrfs_clear_path_blocking+0x3a/0x80
  [<ffffffff81236f3d>] btrfs_search_slot+0x4fd/0x870
  [<ffffffff8123873e>] btrfs_next_old_leaf+0x24e/0x480
  [<ffffffff81238980>] btrfs_next_leaf+0x10/0x20
  [<ffffffff8126e7f8>] __btrfs_drop_extents+0x2a8/0xe80
  [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470
  [<ffffffff811937e5>] ? kmem_cache_alloc+0x1a5/0x1d0
  [<ffffffff8125f7a7>] insert_reserved_file_extent.clone.0+0xa7/0x2f0
  [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560
  [<ffffffff81262825>] finish_ordered_fn+0x15/0x20
  [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0
  [<ffffffff8107052b>] process_one_work+0x1db/0x510
  [<ffffffff810704b6>] ? process_one_work+0x166/0x510
  [<ffffffff810717ef>] worker_thread+0x11f/0x3c0
  [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190
  [<ffffffff8107791e>] kthread+0xde/0x100
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
  [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
Code: c2 01 4c 89 ef 48 63 d2 e8 22 8a 32 00 48 63 35 cf 26 d2 00 89 c2 
48 39 f2
  73 55 48 8b 04 d5 00 57 d4 81 83 fb 02 41 0f b7 0c 04 <74> 55 8b 04 25 
00 c3 5f
  ff f6 c4 10 74 1b 66 0f 1f 44 00 00 f3

--------------------------------------------------------------------------

NMI backtrace for cpu 14
CPU: 14 PID: 23832 Comm: tar Tainted: G            E 3.16.0-rc1 #3
Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
task: ffff8808028f4950 ti: ffff8808028ac000 task.ti: ffff8808028ac000
RIP: 0010:[<ffffffff8135a5e7>]  [<ffffffff8135a5e7>] delay_tsc+0x37/0x60
RSP: 0018:ffff88041fce3ad8  EFLAGS: 00000046
RAX: 000000002ae3de42 RBX: ffffffff82b54958 RCX: 000000000000000e
RDX: 00000000000000ff RSI: 000000002ae3d696 RDI: 0000000000000b75
RBP: ffff88041fce3ad8 R08: 000000000000000e R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000270b
R13: 0000000000000020 R14: 0000000000000020 R15: ffffffff8143d620
FS:  00007f84603367a0(0000) GS:ffff88041fce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffea97afd8 CR3: 000000080ae98000 CR4: 00000000000007e0
Stack:
  ffff88041fce3ae8 ffffffff8135a5a8 ffff88041fce3b18 ffffffff8143d5b0
  ffffffff8143309c ffffffff82b54958 0000000000000074 000000000000002a
  ffff88041fce3b38 ffffffff8143d646 ffffffff829cde5f ffffffff82b54958
Call Trace:
<IRQ>  [<ffffffff8135a5a8>] __const_udelay+0x28/0x30
  [<ffffffff8143d5b0>] wait_for_xmitr+0x30/0xa0
  [<ffffffff8143309c>] ? vt_console_print+0x2cc/0x3b0
  [<ffffffff8143d646>] serial8250_console_putchar+0x26/0x40
  [<ffffffff814387ae>] uart_console_write+0x3e/0x70
  [<ffffffff8143fcf6>] serial8250_console_write+0xb6/0x180
  [<ffffffff810b2635>] call_console_drivers.clone.2+0xa5/0x100
  [<ffffffff810b2773>] console_cont_flush.clone.0+0xe3/0x190
  [<ffffffff810b2858>] console_unlock+0x38/0x310
  [<ffffffff810b1d57>] ? __down_trylock_console_sem+0x47/0x50
  [<ffffffff810b303d>] ? vprintk_emit+0x2bd/0x5d0
  [<ffffffff810b305e>] vprintk_emit+0x2de/0x5d0
  [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470
  [<ffffffff81669486>] printk+0x4d/0x4f
  [<ffffffff810bdf1d>] print_cpu_stall+0x2d/0x140
  [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0
  [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210
  [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0
  [<ffffffff8105f938>] update_process_times+0x48/0x80
  [<ffffffff810caa77>] tick_sched_handle+0x37/0x80
  [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90
  [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0
  [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0
  [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60
  [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60
  [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70
<EOI>  [<ffffffff810a7223>] ? queue_read_lock_slowpath+0x73/0x90
  [<ffffffff810a69c4>] do_raw_read_lock+0x44/0x50
  [<ffffffff8166e6bc>] _raw_read_lock+0x3c/0x50
  [<ffffffff81294be8>] ? btrfs_tree_read_lock+0x58/0x130
  [<ffffffff81294be8>] btrfs_tree_read_lock+0x58/0x130
  [<ffffffff8122d7ab>] btrfs_read_lock_root_node+0x3b/0x50
  [<ffffffff8123709c>] btrfs_search_slot+0x65c/0x870
  [<ffffffff8125cef8>] ? btrfs_set_bit_hook+0xd8/0x150
  [<ffffffff8124b76d>] btrfs_lookup_dir_item+0x7d/0xd0
  [<ffffffff8126a895>] btrfs_lookup_dentry+0xb5/0x390
  [<ffffffff8166e82b>] ? _raw_spin_unlock+0x2b/0x40
  [<ffffffff8126ab86>] btrfs_lookup+0x16/0x40
  [<ffffffff811ac3ad>] lookup_real+0x1d/0x60
  [<ffffffff811aeac4>] lookup_open+0xc4/0x1c0
  [<ffffffff811b0428>] ? do_last+0x338/0x8c0
  [<ffffffff811b0442>] do_last+0x352/0x8c0
  [<ffffffff811acc00>] ? __inode_permission+0x90/0xd0
  [<ffffffff811b3224>] path_openat+0xc4/0x480
  [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470
  [<ffffffff810a082b>] ? cpuacct_charge+0x6b/0x90
  [<ffffffff811c0b86>] ? __alloc_fd+0x36/0x150
  [<ffffffff811b371a>] do_filp_open+0x4a/0xa0
  [<ffffffff811c0bfc>] ? __alloc_fd+0xac/0x150
  [<ffffffff810f4254>] ? __audit_syscall_entry+0x94/0x100
  [<ffffffff811a131a>] do_sys_open+0x11a/0x230
  [<ffffffff811a146e>] SyS_open+0x1e/0x20
  [<ffffffff8166ee12>] system_call_fastpath+0x16/0x1b
Code: 04 25 30 b0 00 00 66 66 90 0f ae e8 0f 31 89 c6 eb 11 66 90 f3 90 
65 8b 0c
  25 30 b0 00 00 44 39 c1 75 12 66 66 90 0f ae e8 0f 31 <89> c2 29 f2 39 
fa 72 e1
  c9 c3 29 c6 01 f7 66 66 90 0f ae e8 0f


-----------------------------------------------------------------------
NMI backtrace for cpu 20
CPU: 20 PID: 154 Comm: kworker/u65:1 Tainted: G            E 3.16.0-rc1 #3
Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
Workqueue: btrfs-endio-write normal_work_helper
task: ffff88040d83e190 ti: ffff88040d840000 task.ti: ffff88040d840000
RIP: 0010:[<ffffffff810a7184>]  [<ffffffff810a7184>] 
queue_write_lock_slowpath+0
x64/0x90
RSP: 0018:ffff88040d843838  EFLAGS: 00000206
RAX: 0000000000000101 RBX: ffff8807f7b51bc8 RCX: 0000000000000101
RDX: 00000000000000ff RSI: ffff88040d83edc8 RDI: ffff8807f7b51bc8
RBP: ffff88040d843838 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8807f7b51bb4
R13: ffff8807f7b51c00 R14: ffff8807f7b51bb8 R15: ffff8807f7b51c48
FS:  0000000000000000(0000) GS:ffff88041fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fc824402008 CR3: 0000000001c10000 CR4: 00000000000007e0
Stack:
  ffff88040d843858 ffffffff810a692d ffff8807f7b51bb8 ffff8807f7b51bc8
  ffff88040d843888 ffffffff8166e3b9 ffffffff81294769 ffff8807f7b51bb4
  ffff8807f7b51c00 ffff88040d8438b8 ffff88040d843918 ffffffff81294769
Call Trace:
  [<ffffffff810a692d>] do_raw_write_lock+0x4d/0xa0
  [<ffffffff8166e3b9>] _raw_write_lock+0x39/0x40
  [<ffffffff81294769>] ? btrfs_tree_lock+0xf9/0x230
  [<ffffffff81294769>] btrfs_tree_lock+0xf9/0x230
  [<ffffffff8122d70e>] ? btrfs_root_node+0x5e/0xc0
  [<ffffffff8109a850>] ? bit_waitqueue+0xe0/0xe0
  [<ffffffff8122d8db>] btrfs_lock_root_node+0x3b/0x50
  [<ffffffff812370ef>] btrfs_search_slot+0x6af/0x870
  [<ffffffff81294600>] ? btrfs_tree_unlock+0x70/0xe0
  [<ffffffff8129462e>] ? btrfs_tree_unlock+0x9e/0xe0
  [<ffffffff810a42c4>] ? __lock_release+0x84/0x180
  [<ffffffff8124bf9d>] btrfs_lookup_file_extent+0x3d/0x40
  [<ffffffff8126e6a2>] __btrfs_drop_extents+0x152/0xe80
  [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470
  [<ffffffff810bb412>] ? rcu_resched+0x22/0x30
  [<ffffffff8166a316>] ? _cond_resched+0x36/0x60
  [<ffffffff811937e5>] ? kmem_cache_alloc+0x1a5/0x1d0
  [<ffffffff8125f7a7>] insert_reserved_file_extent.clone.0+0xa7/0x2f0
  [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560
  [<ffffffff81262825>] finish_ordered_fn+0x15/0x20
  [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0
  [<ffffffff8107052b>] process_one_work+0x1db/0x510
  [<ffffffff810704b6>] ? process_one_work+0x166/0x510
  [<ffffffff810717ef>] worker_thread+0x11f/0x3c0
  [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190
  [<ffffffff8107791e>] kthread+0xde/0x100
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
  [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70
Code: 83 47 04 01 c9 c3 90 f3 90 8b 17 84 d2 75 f8 89 d1 89 d0 83 c9 01 
f0 0f b1
  0f 39 d0 75 e9 ba ff 00 00 00 eb 04 66 90 f3 90 8b 07 <83> f8 01 75 f7 
f0 0f b1
  17 83 f8 01 75 ee eb c4 f3 90 0f b7 01


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19 17:52                               ` Waiman Long
@ 2014-06-19 20:10                                 ` Chris Mason
  2014-06-19 21:50                                   ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-19 20:10 UTC (permalink / raw)
  To: Waiman Long; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/19/2014 01:52 PM, Waiman Long wrote:
> On 06/19/2014 12:51 PM, Chris Mason wrote:
>> On 06/18/2014 11:21 PM, Waiman Long wrote:
>>> On 06/18/2014 10:11 PM, Chris Mason wrote:
>>>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>>>> Dionne<marc.c.dionne@gmail.com>   wrote:
>>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>
>>>>>> wrote:
>>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>>> And for an additional data point, just removing those
>>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>>>> the symptoms when lockdep is not enabled.
>>>> Ok, somehow we've added a lock inversion here that wasn't here before.
>>>> Thanks for confirming, I'll nail it down.
>>>>
>>>> -chris
>>>>
>>> I am pretty sure that the hangup is caused by the following kind of code
>>> fragment in the locking.c file:
>>>
>>>   if (eb->lock_nested) {
>>>                  read_lock(&eb->lock);
>>>                  if (eb->lock_nested&&  current->pid ==
>>> eb->lock_owner) {
>>>
>>> Is it possible to do the check without taking the read_lock?
>> I think you're right, we haven't added any new recursive takers of the
>> lock.  The path where we are deadlocking has an extent buffer that isn't
>> in the path yet locked.  I think we're taking the read lock while that
>> one is write locked.
>>
>> Reworking the nesting a big here.
>>
>> -chris
> 
> I would like to take back my comments. I took out the read_lock, but the
> process still hang while doing file activities on btrfs filesystem. So
> the problem is trickier than I thought. Below are the stack backtraces
> of some of the relevant processes.
> 

You weren't wrong, but it was also the tree trylock code.  Our trylocks
only back off if the blocking lock is held.  btrfs_next_leaf needs it to
be a true trylock.  The confusing part is this hasn't really changed,
but one of the callers must be a spinner where we used to have a blocker.

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19 20:10                                 ` Chris Mason
@ 2014-06-19 21:50                                   ` Chris Mason
  2014-06-19 23:21                                     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2014-06-19 21:50 UTC (permalink / raw)
  To: Waiman Long; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/19/2014 04:10 PM, Chris Mason wrote:
> On 06/19/2014 01:52 PM, Waiman Long wrote:
>> On 06/19/2014 12:51 PM, Chris Mason wrote:
>>> On 06/18/2014 11:21 PM, Waiman Long wrote:
>>>> On 06/18/2014 10:11 PM, Chris Mason wrote:
>>>>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>>>>> Dionne<marc.c.dionne@gmail.com>   wrote:
>>>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>
>>>>>>> wrote:
>>>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>>>> And for an additional data point, just removing those
>>>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>>>>> the symptoms when lockdep is not enabled.
>>>>> Ok, somehow we've added a lock inversion here that wasn't here before.
>>>>> Thanks for confirming, I'll nail it down.
>>>>>
>>>>> -chris
>>>>>
>>>> I am pretty sure that the hangup is caused by the following kind of code
>>>> fragment in the locking.c file:
>>>>
>>>>   if (eb->lock_nested) {
>>>>                  read_lock(&eb->lock);
>>>>                  if (eb->lock_nested&&  current->pid ==
>>>> eb->lock_owner) {
>>>>
>>>> Is it possible to do the check without taking the read_lock?
>>> I think you're right, we haven't added any new recursive takers of the
>>> lock.  The path where we are deadlocking has an extent buffer that isn't
>>> in the path yet locked.  I think we're taking the read lock while that
>>> one is write locked.
>>>
>>> Reworking the nesting a big here.
>>>
>>> -chris
>>
>> I would like to take back my comments. I took out the read_lock, but the
>> process still hang while doing file activities on btrfs filesystem. So
>> the problem is trickier than I thought. Below are the stack backtraces
>> of some of the relevant processes.
>>
> 
> You weren't wrong, but it was also the tree trylock code.  Our trylocks
> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
> be a true trylock.  The confusing part is this hasn't really changed,
> but one of the callers must be a spinner where we used to have a blocker.

This is what I have queued up, it's working here.

-chris

commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
Author: Chris Mason <clm@fb.com>
Date:   Thu Jun 19 14:16:52 2014 -0700

    Btrfs: fix deadlocks with trylock on tree nodes
    
    The Btrfs tree trylock function is poorly named.  It always takes
    the spinlock and backs off if the blocking lock is held.  This
    can lead to surprising lockups because people expect it to really be a
    trylock.
    
    This commit makes it a pure trylock, both for the spinlock and the
    blocking lock.  It also reworks the nested lock handling slightly to
    avoid taking the read lock while a spinning write lock might be held.
    
    Signed-off-by: Chris Mason <clm@fb.com>

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 01277b8..5665d21 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -33,14 +33,14 @@ static void btrfs_assert_tree_read_locked(struct extent_buffer *eb);
  */
 void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
 	if (rw == BTRFS_WRITE_LOCK) {
 		if (atomic_read(&eb->blocking_writers) == 0) {
 			WARN_ON(atomic_read(&eb->spinning_writers) != 1);
@@ -65,14 +65,15 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
  */
 void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
+
 	if (rw == BTRFS_WRITE_LOCK_BLOCKING) {
 		BUG_ON(atomic_read(&eb->blocking_writers) != 1);
 		write_lock(&eb->lock);
@@ -99,6 +100,9 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
 again:
+	BUG_ON(!atomic_read(&eb->blocking_writers) &&
+	       current->pid == eb->lock_owner);
+
 	read_lock(&eb->lock);
 	if (atomic_read(&eb->blocking_writers) &&
 	    current->pid == eb->lock_owner) {
@@ -132,7 +136,9 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers))
 		return 0;
 
-	read_lock(&eb->lock);
+	if (!read_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers)) {
 		read_unlock(&eb->lock);
 		return 0;
@@ -151,7 +157,10 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers))
 		return 0;
-	write_lock(&eb->lock);
+
+	if (!write_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers)) {
 		write_unlock(&eb->lock);
@@ -168,14 +177,15 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->spinning_readers) == 0);
@@ -189,14 +199,15 @@ void btrfs_tree_read_unlock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->blocking_readers) == 0);
@@ -244,6 +255,7 @@ void btrfs_tree_unlock(struct extent_buffer *eb)
 	BUG_ON(blockers > 1);
 
 	btrfs_assert_tree_locked(eb);
+	eb->lock_owner = 0;
 	atomic_dec(&eb->write_locks);
 
 	if (blockers) {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19 21:50                                   ` Chris Mason
@ 2014-06-19 23:21                                     ` Waiman Long
  2014-06-20  3:20                                       ` Tsutomu Itoh
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-06-19 23:21 UTC (permalink / raw)
  To: Chris Mason; +Cc: Marc Dionne, Josef Bacik, linux-btrfs, t-itoh

On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>
>>> I would like to take back my comments. I took out the read_lock, but the
>>> process still hang while doing file activities on btrfs filesystem. So
>>> the problem is trickier than I thought. Below are the stack backtraces
>>> of some of the relevant processes.
>>>
>> You weren't wrong, but it was also the tree trylock code.  Our trylocks
>> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
>> be a true trylock.  The confusing part is this hasn't really changed,
>> but one of the callers must be a spinner where we used to have a blocker.
> This is what I have queued up, it's working here.
>
> -chris
>
> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
> Author: Chris Mason<clm@fb.com>
> Date:   Thu Jun 19 14:16:52 2014 -0700
>
>      Btrfs: fix deadlocks with trylock on tree nodes
>
>      The Btrfs tree trylock function is poorly named.  It always takes
>      the spinlock and backs off if the blocking lock is held.  This
>      can lead to surprising lockups because people expect it to really be a
>      trylock.
>
>      This commit makes it a pure trylock, both for the spinlock and the
>      blocking lock.  It also reworks the nested lock handling slightly to
>      avoid taking the read lock while a spinning write lock might be held.
>
>      Signed-off-by: Chris Mason<clm@fb.com>

I didn't realize that those non-blocking lock functions are really 
trylocks. Yes, the patch did seem to fix the hanging problem that I saw 
when I just untar the kernel source files into a btrfs filesystem. 
However, when I tried did a kernel build on a 24-thread (-j 24) system, 
the build process hanged after a while. The following kind of stack 
trace messages were printed:

INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
       Tainted: G            E 3.16.0-rc1 #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
  ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
  0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
  ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff
Call Trace:
  [<ffffffff81592de9>] schedule+0x29/0x70
  [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
  [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
  [<ffffffff8106d595>] ? insert_work+0x65/0xb0
  [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
  [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
  [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
  [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
  [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
  [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
  [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
  [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
  [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
  [<ffffffff8107544e>] kthread+0xce/0xf0
  [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70

It looks like some more work may still be needed. Or it could be a 
problem in my system configuration.

-Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-19 23:21                                     ` Waiman Long
@ 2014-06-20  3:20                                       ` Tsutomu Itoh
  2014-06-21  1:09                                         ` Long, Wai Man
  0 siblings, 1 reply; 23+ messages in thread
From: Tsutomu Itoh @ 2014-06-20  3:20 UTC (permalink / raw)
  To: Chris Mason; +Cc: Waiman Long, Marc Dionne, Josef Bacik, linux-btrfs

On 2014/06/20 8:21, Waiman Long wrote:
> On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>>
>>>> I would like to take back my comments. I took out the read_lock, but the
>>>> process still hang while doing file activities on btrfs filesystem. So
>>>> the problem is trickier than I thought. Below are the stack backtraces
>>>> of some of the relevant processes.
>>>>
>>> You weren't wrong, but it was also the tree trylock code.  Our trylocks
>>> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
>>> be a true trylock.  The confusing part is this hasn't really changed,
>>> but one of the callers must be a spinner where we used to have a blocker.
>> This is what I have queued up, it's working here.
>>
>> -chris
>>
>> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>> Author: Chris Mason<clm@fb.com>
>> Date:   Thu Jun 19 14:16:52 2014 -0700
>>
>>      Btrfs: fix deadlocks with trylock on tree nodes
>>
>>      The Btrfs tree trylock function is poorly named.  It always takes
>>      the spinlock and backs off if the blocking lock is held.  This
>>      can lead to surprising lockups because people expect it to really be a
>>      trylock.
>>
>>      This commit makes it a pure trylock, both for the spinlock and the
>>      blocking lock.  It also reworks the nested lock handling slightly to
>>      avoid taking the read lock while a spinning write lock might be held.
>>
>>      Signed-off-by: Chris Mason<clm@fb.com>
>
> I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed:
>
> INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
>        Tainted: G            E 3.16.0-rc1 #5
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
>   ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
>   0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
>   ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff
> Call Trace:
>   [<ffffffff81592de9>] schedule+0x29/0x70
>   [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
>   [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
>   [<ffffffff8106d595>] ? insert_work+0x65/0xb0
>   [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
>   [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
>   [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
>   [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
>   [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
>   [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
>   [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
>   [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
>   [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
>   [<ffffffff8107544e>] kthread+0xce/0xf0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>
> It looks like some more work may still be needed. Or it could be a problem in my system configuration.
>

Umm, after applying Chris's patch to my environment, xfstests ran
completely and the above messages were not output. (Are above messages
another bug?)

Thanks,
Tsutomu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Lockups with btrfs on 3.16-rc1 - bisected
  2014-06-20  3:20                                       ` Tsutomu Itoh
@ 2014-06-21  1:09                                         ` Long, Wai Man
  0 siblings, 0 replies; 23+ messages in thread
From: Long, Wai Man @ 2014-06-21  1:09 UTC (permalink / raw)
  To: Tsutomu Itoh, Chris Mason
  Cc: Marc Dionne, Josef Bacik, linux-btrfs@vger.kernel.org

Hi,

It may be caused by an corrupted btrfs filesystem due to the repeating hanging during the test. As long as your guys don't any problem, I am happy with the patch.

Thank,
Longman


-----Original Message-----
From: Tsutomu Itoh [mailto:t-itoh@jp.fujitsu.com] 
Sent: Thursday, June 19, 2014 11:21 PM
To: Chris Mason
Cc: Long, Wai Man; Marc Dionne; Josef Bacik; linux-btrfs@vger.kernel.org
Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected

On 2014/06/20 8:21, Waiman Long wrote:
> On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>>
>>>> I would like to take back my comments. I took out the read_lock, 
>>>> but the process still hang while doing file activities on btrfs 
>>>> filesystem. So the problem is trickier than I thought. Below are 
>>>> the stack backtraces of some of the relevant processes.
>>>>
>>> You weren't wrong, but it was also the tree trylock code.  Our 
>>> trylocks only back off if the blocking lock is held.  
>>> btrfs_next_leaf needs it to be a true trylock.  The confusing part 
>>> is this hasn't really changed, but one of the callers must be a spinner where we used to have a blocker.
>> This is what I have queued up, it's working here.
>>
>> -chris
>>
>> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>> Author: Chris Mason<clm@fb.com>
>> Date:   Thu Jun 19 14:16:52 2014 -0700
>>
>>      Btrfs: fix deadlocks with trylock on tree nodes
>>
>>      The Btrfs tree trylock function is poorly named.  It always takes
>>      the spinlock and backs off if the blocking lock is held.  This
>>      can lead to surprising lockups because people expect it to really be a
>>      trylock.
>>
>>      This commit makes it a pure trylock, both for the spinlock and the
>>      blocking lock.  It also reworks the nested lock handling slightly to
>>      avoid taking the read lock while a spinning write lock might be held.
>>
>>      Signed-off-by: Chris Mason<clm@fb.com>
>
> I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed:
>
> INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
>        Tainted: G            E 3.16.0-rc1 #5
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
>   ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
>   0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
>   ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff 
> Call Trace:
>   [<ffffffff81592de9>] schedule+0x29/0x70
>   [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
>   [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
>   [<ffffffff8106d595>] ? insert_work+0x65/0xb0
>   [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
>   [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
>   [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
>   [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
>   [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
>   [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
>   [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
>   [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
>   [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
>   [<ffffffff8107544e>] kthread+0xce/0xf0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>
> It looks like some more work may still be needed. Or it could be a problem in my system configuration.
>

Umm, after applying Chris's patch to my environment, xfstests ran completely and the above messages were not output. (Are above messages another bug?)

Thanks,
Tsutomu



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-06-21  1:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-18 20:57 Lockups with btrfs on 3.16-rc1 - bisected Marc Dionne
2014-06-18 22:17 ` Waiman Long
2014-06-18 22:27   ` Josef Bacik
2014-06-18 22:47     ` Waiman Long
2014-06-18 23:10       ` Josef Bacik
2014-06-18 23:19         ` Waiman Long
2014-06-18 23:27           ` Chris Mason
2014-06-18 23:30             ` Waiman Long
2014-06-18 23:53               ` Chris Mason
2014-06-19  0:03                 ` Marc Dionne
2014-06-19  0:08                   ` Waiman Long
2014-06-19  0:41                     ` Marc Dionne
2014-06-19  2:03                       ` Marc Dionne
2014-06-19  2:11                         ` Chris Mason
2014-06-19  3:21                           ` Waiman Long
2014-06-19 16:51                             ` Chris Mason
2014-06-19 17:52                               ` Waiman Long
2014-06-19 20:10                                 ` Chris Mason
2014-06-19 21:50                                   ` Chris Mason
2014-06-19 23:21                                     ` Waiman Long
2014-06-20  3:20                                       ` Tsutomu Itoh
2014-06-21  1:09                                         ` Long, Wai Man
2014-06-19  9:49 ` btrfs-transacti:516 blocked 120 seconds on 3.16-rc1 Konstantinos Skarlatos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).