Re: [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling
Date: Mon, 3 Oct 2022 17:27:52 -0400	[thread overview]
Message-ID: <YztT2OJake65WG3P@x1n> (raw)
In-Reply-To: <YzsZ3VV/95AvXDUz@monkey>

On Mon, Oct 03, 2022 at 10:20:29AM -0700, Mike Kravetz wrote:
> On 10/03/22 11:56, Peter Xu wrote:
> > After the recent rework patchset of hugetlb locking on pmd sharing,
> > kselftest for userfaultfd sometimes fails on hugetlb private tests with
> > unexpected write fault checks.
> > 
> > It turns out there's nothing wrong within the locking series regarding this
> > matter, but it could have changed the timing of threads so it can trigger
> > an old bug.
> > 
> > The real bug is when we call hugetlb_no_page() we're not with the pgtable
> > lock.  It means we're reading the pte values lockless.  It's perfectly fine
> > in most cases because before we do normal page allocations we'll take the
> > lock and check pte_same() again.  However before that, there are actually
> > two paths on userfaultfd missing/minor handling that may directly move on
> > with the fault process without checking the pte values.
> > 
> > It means for these two paths we may be generating an uffd message based on
> > an unstable pte, while an unstable pte can legally be anything as long as
> > the modifier holds the pgtable lock.
> > 
> > One example, which is also what happened in the failing kselftest and
> > caused the test failure, is that for private mappings CoW can happen on one
> > page.  CoW requires pte being cleared before being replaced with a new page
> > for TLB coherency, but then there can be a race condition:
> > 
> >         thread 1                              thread 2
> >         --------                              --------
> > 
> >       hugetlb_fault                         hugetlb_fault
> >         private pte RO
> >         hugetlb_wp
> >           pgtable_lock()
> >           huge_ptep_clear_flush
> >                                               pte=NULL
> >                                               hugetlb_no_page
> >                                                 generate uffd missing event
> >                                                 even if page existed!!
> >           set_huge_pte_at
> >           pgtable_unlock()
> 
> Thanks for working on this Peter!
> 
> I agree with this patch, but I suspect the above race is not possible.  Why?
> In both cases, we take the hugetlb fault mutex when processing a huegtlb
> page fault.  This means only one thread can execute the fault code for
> a specific mapping/index at a time.  This is why I was so confused, and may
> remain a bit confused about how we end up with userfault processing a write
> fault.  It was part of the reason for my 'unclear' wording about this being
> more about cpus not seeing updated values.  Note that we do drop the fault
> mutex before calling handle_usefault, but by then we have already made the
> 'missing' determination.
> 
> Thoughts?  Perhaps, I am still confused.

It's my fault to have the commit message wrong, sorry. :) And thanks for
raising this question, I could have overlooked that.

It turns out it's not the CoW that's clearing the pte... it's the
wr-protect with huge_ptep_modify_prot_start().  So the race is with
UFFDIO_WRITEPROTECT, not CoW.

Obviously when I was tracking the hpte changes I overlooked
huge_ptep_get_and_clear(), only seeing the CoW path and I'm pretty sure
that's already a bug which was obvious enough.  I didn't prove they
happened at the same time during the MISSING event.

Then after I further looked at the CoW code I start to question myself on
why CoW would trigger at all even with an available fault mutex, since for
private mappings mapcount should be 1:

	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
		if (!PageAnonExclusive(old_page))
			page_move_anon_rmap(old_page, vma);
		if (likely(!unshare))
			set_huge_ptep_writable(vma, haddr, ptep);

		delayacct_wpcopy_end();
		return 0;
	}

There could still be something else I didn't catch, even though that may
not be relevant to this patchset anymore.

I'll wait a few more hours for other reviewers, then prepare a new version
with the corrected commit message.

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2022-10-03 21:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 15:56 [PATCH 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu
2022-10-03 15:56 ` [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu
2022-10-03 17:20   ` Mike Kravetz
2022-10-03 21:27     ` Peter Xu [this message]
2022-10-03 21:45       ` Mike Kravetz
2022-10-04  0:28         ` Peter Xu
2022-10-03 21:00   ` Nadav Amit
2022-10-03 21:16     ` Peter Xu
2022-10-03 21:32       ` Nadav Amit
2022-10-03 15:56 ` [PATCH 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu
2022-10-03 15:56 ` [PATCH 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YztT2OJake65WG3P@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.