public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Liam Howlett <liam.howlett@oracle.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Luigi Rizzo <lrizzo@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	"kernel-team@fb.com" <kernel-team@fb.com>,
	"walken@fb.com" <walken@fb.com>
Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock
Date: Wed, 8 Sep 2021 22:50:35 -0700	[thread overview]
Message-ID: <18766a42-ea2f-e019-a8f7-43db50870300@fb.com> (raw)
In-Reply-To: <20210908233331.GA3544071@ziepe.ca>



On 9/8/21 4:33 PM, Jason Gunthorpe wrote:
> On Wed, Sep 08, 2021 at 12:11:54PM -0700, Yonghong Song wrote:
>>
>>
>> On 9/8/21 11:49 AM, Jason Gunthorpe wrote:
>>> On Wed, Sep 08, 2021 at 06:30:52PM +0000, Liam Howlett wrote:
>>>
>>>>    /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
>>>> -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>>>> +struct vm_area_struct *find_vma_non_owner(struct mm_struct *mm,
>>>> +					 unsigned long addr)
>>>>    {
>>>>    	struct rb_node *rb_node;
>>>>    	struct vm_area_struct *vma;
>>>> -	mmap_assert_locked(mm);
>>>> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
>>>>    	/* Check the cache first. */
>>>>    	vma = vmacache_find(mm, addr);
>>>>    	if (likely(vma))
>>>> @@ -2325,6 +2326,11 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>>>>    	return vma;
>>>>    }
>>>> +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>>>> +{
>>>> +	lockdep_assert_held(&mm->mmap_lock);
>>>> +	return find_vma_non_owner(mm, addr);
>>>> +}
>>>>    EXPORT_SYMBOL(find_vma);
>>>>    /*
>>>>
>>>>
>>>> Although this leaks more into the mm API and was referred to as ugly
>>>> previously, it does provide a working solution and still maintains the
>>>> same level of checking.
>>>
>>> I think it is no better than before.
>>>
>>> The solution must be to not break lockdep in the BPF side. If Peter's
>>> reworked algorithm is not OK then BPF should drop/acquire the lockdep
>>> when it punts the unlock to the WQ.
>>
>> The current warning is triggered by bpf calling find_vma().
> 
> Yes, but that is because the lockdep has already been dropped.
> 
> It looks to me like it basically does this:
> 
>          mmap_read_trylock_non_owner(current->mm)
> 
>          vma = find_vma(current->mm, ips[i]);
> 
>          if (!work) {
>                  mmap_read_unlock_non_owner(current->mm);
>          } else {
>                  work->mm = current->mm;
>                  irq_work_queue(&work->irq_work);
> 
> 
> And the only reason for this lockdep madness is because the
> irq_work_queue() does:
> 
> static void do_up_read(struct irq_work *entry)
> {
>          struct stack_map_irq_work *work;
> 
>          if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
>                  return;
> 
>          work = container_of(entry, struct stack_map_irq_work, irq_work);
>          mmap_read_unlock_non_owner(work->mm);
> }
> 
> 
> This is all about deferring the unlock to outside an IRQ context. The
> lockdep ownership is transfered from the IRQ to the work, which is
> something that we don't usually do or model in lockdep.
> 
> Lockdep complains with the straightforward code because exiting an IRQ
> with locks held is illegal.
> 
> The saner version is more like:
> 
>          mmap_read_trylock(current->mm)
> 
>          vma = find_vma(current->mm, ips[i]);
> 
>          if (!work) {
>                  mmap_read_unlock(current->mm);
>          } else {
>                  work->mm = current->mm;
>                  <tell lockdep we really do mean to return with
> 		 the lock held>
>                  rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_);
>                  irq_work_queue(&work->irq_work);
> 
> 
> do_up_read():
>         <tell lockdep the lock was already released from the map>
>         mmap_read_unlock_non_owner(work->mm);
> 
> ie properly model in lockdep that ownership moves from the IRQ to the
> work. At least we don't corrupt the core mm code with this insanity.

Thanks for the suggestion! I verified the above change indeed work.
Will send v4 soon.

> 
> Jason
> 


  reply	other threads:[~2021-09-09  5:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  4:44 [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock Yonghong Song
2021-09-08 12:20 ` Daniel Borkmann
2021-09-08 13:53   ` Jason Gunthorpe
2021-09-08 14:15     ` Peter Zijlstra
2021-09-08 14:43       ` Luigi Rizzo
2021-09-08 15:12         ` Liam Howlett
2021-09-08 16:09           ` Yonghong Song
2021-09-08 17:09             ` Luigi Rizzo
2021-09-08 17:21               ` Alexei Starovoitov
2021-09-08 17:52                 ` Andrew Morton
2021-09-08 18:02                   ` Alexei Starovoitov
2021-09-08 18:15                     ` Andrew Morton
2021-09-08 18:20                       ` Alexei Starovoitov
2021-09-08 18:30                         ` Liam Howlett
2021-09-08 18:45                           ` Yonghong Song
2021-09-08 18:49                           ` Jason Gunthorpe
2021-09-08 19:11                             ` Yonghong Song
2021-09-08 23:33                               ` Jason Gunthorpe
2021-09-09  5:50                                 ` Yonghong Song [this message]
2021-09-09  8:05                           ` Peter Zijlstra
2021-09-08 18:43                     ` Liam Howlett
2021-09-08 19:42                       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18766a42-ea2f-e019-a8f7-43db50870300@fb.com \
    --to=yhs@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jgg@ziepe.ca \
    --cc=kernel-team@fb.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=lrizzo@google.com \
    --cc=peterz@infradead.org \
    --cc=walken@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox