From: Waiman Long <longman@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Alexei Starovoitov <ast@kernel.org>,
davem@davemloft.net, daniel@iogearbox.net, edumazet@google.com,
jannh@google.com, netdev@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
Date: Wed, 30 Jan 2019 21:48:27 -0500 [thread overview]
Message-ID: <e7748be1-8b0b-6613-159f-1c45f932c617@redhat.com> (raw)
In-Reply-To: <20190131020150.zbys2r6me7em2jnl@ast-mbp.dhcp.thefacebook.com>
On 01/30/2019 09:01 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote:
>> On 01/30/2019 04:11 PM, Waiman Long wrote:
>>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>>>> Lockdep warns about false positive:
>>>>>>> This is not a false positive, and you probably also need to use
>>>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>>>
>>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>>>> relies on 'owner' tracking.
>>>>>> Can you point out in the code the spin bit?
>>>>>> As far as I can see sem->owner is debug only feature.
>>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>>>> The debugging part is just an add-on. It is not the reason for the
>>>>> presence of sem->owner.
>>>> I see. Got it.
>>>>
>>>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>>>> with this patch today.
>>>>>>
>>>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>>>> should be easy to do.
>>>> Yes, but looking through the code it's not clear to me that it's safe
>>>> to mix non_owner() versions with regular.
>>>> bpf/stackmap.c does down_read_trylock + up_read.
>>>> If we add new down_read_trylock_non_owner that set the owner to
>>>> NULL | RWSEM_* bits is this safe with conccurent read/write
>>>> that do regular versions?
>>>> rwsem_can_spin_on_owner() does:
>>>> if (owner) {
>>>> ret = is_rwsem_owner_spinnable(owner) &&
>>>> owner_on_cpu(owner);
>>>> that looks correct.
>>>> For a second I thought there could be fault here due to non_owner.
>>>> But there could be other places where it's assumed that owner
>>>> is never null?
>>> The content of owner is not the cause of the lockdep warning. The
>>> lockdep code assumes that the task that acquires the lock will release
>>> it some time later. That is not the case when you need to acquire the
>>> lock by one task and released by another. In this case, you have to use
>>> the non_owner version of down/up_read which disable the lockdep
>>> acquire/release tracking. That will be the only difference between the
>>> two set of APIs.
>>>
>>> Cheers,
>>> Longman
>> BTW, you may want to do something like that to make sure that the lock
>> ownership is probably tracked.
>>
>> -Longman
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index d43b145..79eef9d 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
>> bpf_stack_
>> } else {
>> work->sem = ¤t->mm->mmap_sem;
>> irq_work_queue(&work->irq_work);
>> +
>> + /*
>> + * The irq_work will release the mmap_sem with
>> + * up_read_non_owner(). The rwsem_release() is called
>> + * here to release the lock from lockdep's perspective.
>> + */
>> + rwsem_release(¤t->mm->mmap_sem.dep_map, 1, _RET_IP_);
> are you saying the above is enough coupled with up_read_non_owner?
> Looking at how amdgpu is using this api... I think they just use non_owner
> version when doing things from different task.
> So I don't think pairing non_owner with non_owner is strictly necessary.
> It seems fine to use down_read_trylock() with above rwsem_release() hack
> plus up_read_non_owner().
> Am I missing something?
>
The statement above is to clear the state for the lockdep so that it
knows that the task no longer owns the lock. Without doing that, there
is a possibility of some other kind of incorrect lockdep warning may be
produced because the code will still think the task hold a read-lock on
the mmap_sem. It is also possible no warning will be reported.
The bpf code is kind of special that it acquires the mmap_sem. Later on,
it either releases it itself (non-NMI) or request irq_work to release it
(NMI). So either, you use the _non_owner versions for both acquire and
release or fake the release like the code segment above.
Peter, please chime in if you have other suggestion.
Cheers,
Longman
next prev parent reply other threads:[~2019-01-31 2:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
2019-01-30 4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
2019-01-30 10:21 ` Peter Zijlstra
2019-01-30 19:27 ` Alexei Starovoitov
2019-01-30 19:53 ` Peter Zijlstra
2019-01-30 20:18 ` Alexei Starovoitov
2019-01-30 4:04 ` [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap Alexei Starovoitov
2019-01-30 10:15 ` Peter Zijlstra
2019-01-30 19:30 ` Alexei Starovoitov
2019-01-30 19:42 ` Waiman Long
2019-01-30 20:10 ` Alexei Starovoitov
2019-01-30 21:11 ` Waiman Long
2019-01-30 21:32 ` Waiman Long
2019-01-31 2:01 ` Alexei Starovoitov
2019-01-31 2:48 ` Waiman Long [this message]
2019-02-06 3:21 ` Eric Dumazet
2019-02-06 3:30 ` Alexei Starovoitov
2019-02-06 3:40 ` Eric Dumazet
2019-01-30 19:44 ` Peter Zijlstra
2019-01-30 20:05 ` Waiman Long
2019-01-30 4:04 ` [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register Alexei Starovoitov
2019-01-30 10:10 ` Peter Zijlstra
2019-01-30 10:37 ` Peter Zijlstra
2019-01-30 19:32 ` Alexei Starovoitov
2019-01-30 19:46 ` Peter Zijlstra
2019-01-30 4:04 ` [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock Alexei Starovoitov
2019-01-30 4:07 ` [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e7748be1-8b0b-6613-159f-1c45f932c617@redhat.com \
--to=longman@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jannh@google.com \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).