Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Suren Baghdasaryan <surenb@google.com>, willy@infradead.org
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com,
	josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com,
	laurent.dufour@fr.ibm.com, michel@lespinasse.org,
	liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz,
	minchan@google.com, dave@stgolabs.net,
	punit.agrawal@bytedance.com, lstoakes@gmail.com,
	hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com,
	ying.huang@intel.com, yuzhao@google.com, dhowells@redhat.com,
	hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org,
	pasha.tatashin@soleen.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com
Subject: Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults
Date: Thu, 10 Aug 2023 09:40:57 +0200	[thread overview]
Message-ID: <01e20a4a-35dc-b342-081f-0edaf8780f51@redhat.com> (raw)
In-Reply-To: <CAJuCfpH8ucOkCFYrVZafUAppi5+mVhy=uD+BK6-oYX=ysQv5qQ@mail.gmail.com>

On 10.08.23 08:24, Suren Baghdasaryan wrote:
> On Wed, Aug 9, 2023 at 10:29 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Wed, Aug 9, 2023 at 11:31 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>
>>> On Wed, Aug 9, 2023 at 11:08 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>
>>>> On Wed, Aug 9, 2023 at 11:04 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>>>>>> Which ends up being
>>>>>>>>>
>>>>>>>>> VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
>>>>>>>>>
>>>>>>>>> I did not check if this is also the case on mainline, and if this series is responsible.
>>>>>>>>
>>>>>>>> Thanks for reporting! I'm checking it now.
>>>>>>>
>>>>>>> Hmm. From the code it's not obvious how lock_mm_and_find_vma() ends up
>>>>>>> calling find_vma() without mmap_lock after successfully completing
>>>>>>> get_mmap_lock_carefully(). lock_mm_and_find_vma+0x3f/0x270 points to
>>>>>>> the first invocation of find_vma(), so this is not even the lock
>>>>>>> upgrade path... I'll try to reproduce this issue and dig up more but
>>>>>>> from the information I have so far this issue does not seem to be
>>>>>>> related to this series.
>>>>>
>>>>> I just checked on mainline and it does not fail there.
>>>
>>> Thanks. Just to eliminate the possibility, I'll try reverting my
>>> patchset in mm-unstable and will try the test again. Will do that in
>>> the evening once I'm home.
>>>
>>>>>
>>>>>>
>>>>>> This is really weird. I added mmap_assert_locked(mm) calls into
>>>>>> get_mmap_lock_carefully() right after we acquire mmap_lock read lock
>>>>>> and one of them triggers right after successful
>>>>>> mmap_read_lock_killable(). Here is my modified version of
>>>>>> get_mmap_lock_carefully():
>>>>>>
>>>>>> static inline bool get_mmap_lock_carefully(struct mm_struct *mm,
>>>>>> struct pt_regs *regs) {
>>>>>>        /* Even if this succeeds, make it clear we might have slept */
>>>>>>        if (likely(mmap_read_trylock(mm))) {
>>>>>>            might_sleep();
>>>>>>            mmap_assert_locked(mm);
>>>>>>            return true;
>>>>>>        }
>>>>>>        if (regs && !user_mode(regs)) {
>>>>>>            unsigned long ip = instruction_pointer(regs);
>>>>>>            if (!search_exception_tables(ip))
>>>>>>                return false;
>>>>>>        }
>>>>>>        if (!mmap_read_lock_killable(mm)) {
>>>>>>            mmap_assert_locked(mm);                     <---- generates a BUG
>>>>>>            return true;
>>>>>>        }
>>>>>>        return false;
>>>>>> }
>>>>>
>>>>> Ehm, that's indeed weird.
>>>>>
>>>>>>
>>>>>> AFAIKT conditions for mmap_read_trylock() and
>>>>>> mmap_read_lock_killable() are checked correctly. Am I missing
>>>>>> something?
>>>>>
>>>>> Weirdly enough, it only triggers during that specific uffd test, right?
>>>>
>>>> Yes, uffd-unit-tests. I even ran it separately to ensure it's not some
>>>> fallback from a previous test and I'm able to reproduce this
>>>> consistently.
>>
>> Yeah, it is somehow related to per-vma locking. Unfortunately I can't
>> reproduce the issue on my VM, so I have to use my host and bisection
>> is slow. I think I'll get to the bottom of this tomorrow.
> 
> Ok, I think I found the issue. 

Nice!

> wp_page_shared() ->
> fault_dirty_shared_page() can drop mmap_lock (see the comment saying
> "Drop the mmap_lock before waiting on IO, if we can...", therefore we
> have to ensure we are not doing this under per-VMA lock.
> I think what happens is that this path is racing with another page
> fault which took mmap_lock for read. fault_dirty_shared_page()
> releases this lock which was taken by another page faulting thread and
> that thread generates an assertion when it finds out the lock it just
> took got released from under it.

I wonder if we could detect that someone releases the mmap lock that was 
not taken by that person, to bail out early at the right place when 
debugging such issues. Only with certain config knobs enabled, of course.

> The following crude change fixed the issue for me but there might be a
> more granular way to deal with this:
> 
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3293,18 +3293,18 @@ static vm_fault_t wp_page_shared(struct
> vm_fault *vmf, struct folio *folio)
>           struct vm_area_struct *vma = vmf->vma;
>           vm_fault_t ret = 0;
> 
> +        if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> +                pte_unmap_unlock(vmf->pte, vmf->ptl);
> +                vma_end_read(vmf->vma);
> +                return VM_FAULT_RETRY;
> +        }
> +

I won't lie: all of these locking checks are a bit hard to get and 
possibly even harder to maintain.

Maybe better mmap unlock sanity checks as spelled out above might help 
improve part of the situation.


And maybe some comments regarding the placement might help as well ;)

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-08-10  7:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-30 21:19 [PATCH v7 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 5/6] mm: handle swap page faults under per-VMA lock Suren Baghdasaryan
2023-06-30 21:19 ` [PATCH v7 6/6] mm: handle userfaults under VMA lock Suren Baghdasaryan
2023-07-02 17:50 ` [PATCH v7 0/6] Per-VMA lock support for swap and userfaults Andrew Morton
2023-07-03 15:27   ` Suren Baghdasaryan
2023-08-09  7:48 ` David Hildenbrand
2023-08-09 14:28   ` Suren Baghdasaryan
2023-08-09 15:22     ` Suren Baghdasaryan
2023-08-09 17:17       ` Suren Baghdasaryan
2023-08-09 18:04         ` David Hildenbrand
2023-08-09 18:08           ` Suren Baghdasaryan
2023-08-09 18:31             ` Suren Baghdasaryan
2023-08-10  5:29               ` Suren Baghdasaryan
2023-08-10  6:24                 ` Suren Baghdasaryan
2023-08-10  7:40                   ` David Hildenbrand [this message]
2023-08-10 20:20                     ` Suren Baghdasaryan
2023-08-10 22:00                     ` Matthew Wilcox
2023-08-10 22:16                   ` Matthew Wilcox
2023-08-10 23:29                     ` Suren Baghdasaryan
2023-08-10 23:43                       ` Suren Baghdasaryan
2023-08-11  6:13                         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01e20a4a-35dc-b342-081f-0edaf8780f51@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=brauner@kernel.org \
    --cc=dave@stgolabs.net \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@android.com \
    --cc=laurent.dufour@fr.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).