From: Marc Dionne <marc.c.dionne@gmail.com>
To: Waiman Long <waiman.long@hp.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <jbacik@fb.com>,
linux-btrfs@vger.kernel.org, t-itoh@jp.fujitsu.com
Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected
Date: Wed, 18 Jun 2014 20:41:48 -0400 [thread overview]
Message-ID: <CAB9dFdvD9QMZ_rONhhP5ZD2E7+A7qxnsY978Snc3zHEbdPRszw@mail.gmail.com> (raw)
In-Reply-To: <53A22A01.7080505@hp.com>
On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote:
> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>
>> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com> wrote:
>>>
>>> On 06/18/2014 07:30 PM, Waiman Long wrote:
>>>>
>>>> On 06/18/2014 07:27 PM, Chris Mason wrote:
>>>>>
>>>>> On 06/18/2014 07:19 PM, Waiman Long wrote:
>>>>>>
>>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote:
>>>>>>>
>>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote:
>>>>>>>>
>>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote:
>>>>>>>>>
>>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote:
>>>>>>>>>>
>>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1
>>>>>>>>>>> similar
>>>>>>>>>>> to what is reported here:
>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> , along with the
>>>>>>>>>>> occasional hard lockup, making it impossible to complete a
>>>>>>>>>>> parallel
>>>>>>>>>>> build on a btrfs filesystem for the package I work on. This was
>>>>>>>>>>> working fine just a few days before rc1.
>>>>>>>>>>>
>>>>>>>>>>> Bisecting brought me to the following commit:
>>>>>>>>>>>
>>>>>>>>>>> commit bd01ec1a13f9a327950c8e3080096446c7804753
>>>>>>>>>>> Author: Waiman Long<Waiman.Long@hp.com>
>>>>>>>>>>> Date: Mon Feb 3 13:18:57 2014 +0100
>>>>>>>>>>>
>>>>>>>>>>> x86, locking/rwlocks: Enable qrwlocks on x86
>>>>>>>>>>>
>>>>>>>>>>> And sure enough if I revert that commit on top of current
>>>>>>>>>>> mainline,
>>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs.
>>>>>>>>>>>
>>>>>>>>>>> Marc
>>>>>>>>>>
>>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not
>>>>>>>>>> allowed unless the task is in an interrupt context. Doing
>>>>>>>>>> recursive
>>>>>>>>>> read_lock will hang the process when a write_lock happens
>>>>>>>>>> somewhere in
>>>>>>>>>> between. Are recursive read_lock being done in the btrfs code?
>>>>>>>>>>
>>>>>>>>> We walk down a tree and read lock each node as we walk down, is
>>>>>>>>> that
>>>>>>>>> what you mean? Or do you mean read_lock multiple times on the same
>>>>>>>>> lock in the same process, cause we definitely don't do that.
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>
>>>>>>>> I meant recursively read_lock the same lock in a process.
>>>>>>>
>>>>>>> I take it back, we do actually do this in some cases. Thanks,
>>>>>>>
>>>>>>> Josef
>>>>>>
>>>>>> This is what I thought when I looked at the looking code in btrfs. The
>>>>>> unlock code doesn't clear the lock_owner pid, this may cause the
>>>>>> lock_nested to be set incorrectly.
>>>>>>
>>>>>> Anyway, are you going to do something about it?
>>>>>
>>>>> Thanks for reporting this, we shouldn't be actually taking the lock
>>>>> recursively. Could you please try with lockdep enabled? If the
>>>>> problem
>>>>> goes away with lockdep on, I think I know what's causing it.
>>>>> Otherwise,
>>>>> lockdep should clue us in.
>>>>>
>>>>> -chris
>>>>
>>>> I am not sure if lockdep will report recursive read_lock as this is
>>>> possible in the past. If not, we certainly need to add that capability
>>>> to it.
>>>>
>>>> One more thing, I saw comment in btrfs tree locking code about taking a
>>>> read lock after taking a write (partial?) lock. That is not possible
>>>> with even with the old rwlock code.
>>>
>>> With lockdep on, the clear_path_blocking function you're hitting
>>> softlockups in is different. Futjitsu hit a similar problem during
>>> quota rescans, and it goes away with lockdep on. I'm trying to nail
>>> down where we went wrong, but please try lockdep on.
>>>
>>> -chris
>>
>> With lockdep on I'm unable to reproduce the lockups, and there are no
>> lockdep warnings.
>>
>> Marc
>
>
> Enabling lockdep may change the lock timing that make it hard to reproduce
> the problem. Anyway, could you try to apply the following patch to see if it
> shows any warning?
>
> -Longman
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index d24e433..b6c9f2e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct
> held_loc
> if (hlock_class(prev) != hlock_class(next))
> continue;
>
> +#ifdef CONFIG_QUEUE_RWLOCK
> + /*
> + * Queue rwlock only allows read-after-read recursion of the
> + * same lock class when the latter read is in an interrupt
> + * context.
> + */
> + if ((read == 2) && prev->read && in_interrupt())
> + return 2;
> +#else
> /*
> * Allow read-after-read recursion of the same
> * lock class (i.e. read_lock(lock)+read_lock(lock)):
> */
> if ((read == 2) && prev->read)
> return 2;
> +#endif
>
> /*
> * We're holding the nest_lock, which serializes this lock's
> @@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct
> held_lock
> * write-lock never takes any other locks, then the reads are
> * equivalent to a NOP.
> */
> +#ifndef CONFIG_QUEUE_RWLOCK
> if (next->read == 2 || prev->read == 2)
> return 1;
> +#endif
> /*
> * Is the <prev> -> <next> dependency already present?
> *
I still don't see any warnings with this patch added. Also tried
along with removing a couple of ifdefs on CONFIG_DEBUG_LOCK_ALLOC in
btrfs/ctree.c - still unable to generate any warnings or lockups.
Marc
next prev parent reply other threads:[~2014-06-19 0:41 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-18 20:57 Lockups with btrfs on 3.16-rc1 - bisected Marc Dionne
2014-06-18 22:17 ` Waiman Long
2014-06-18 22:27 ` Josef Bacik
2014-06-18 22:47 ` Waiman Long
2014-06-18 23:10 ` Josef Bacik
2014-06-18 23:19 ` Waiman Long
2014-06-18 23:27 ` Chris Mason
2014-06-18 23:30 ` Waiman Long
2014-06-18 23:53 ` Chris Mason
2014-06-19 0:03 ` Marc Dionne
2014-06-19 0:08 ` Waiman Long
2014-06-19 0:41 ` Marc Dionne [this message]
2014-06-19 2:03 ` Marc Dionne
2014-06-19 2:11 ` Chris Mason
2014-06-19 3:21 ` Waiman Long
2014-06-19 16:51 ` Chris Mason
2014-06-19 17:52 ` Waiman Long
2014-06-19 20:10 ` Chris Mason
2014-06-19 21:50 ` Chris Mason
2014-06-19 23:21 ` Waiman Long
2014-06-20 3:20 ` Tsutomu Itoh
2014-06-21 1:09 ` Long, Wai Man
2014-06-19 9:49 ` btrfs-transacti:516 blocked 120 seconds on 3.16-rc1 Konstantinos Skarlatos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAB9dFdvD9QMZ_rONhhP5ZD2E7+A7qxnsY978Snc3zHEbdPRszw@mail.gmail.com \
--to=marc.c.dionne@gmail.com \
--cc=clm@fb.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=t-itoh@jp.fujitsu.com \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).