All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>, Maya Gokhale <gokhale2@llnl.gov>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Martin Cracauer <cracauer@cons.org>,
	Denis Plotnikov <dplotnikov@virtuozzo.com>,
	Shaohua Li <shli@fb.com>, Andrea Arcangeli <aarcange@redhat.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Marty McFadden <mcfadden8@llnl.gov>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH RFC 10/24] userfaultfd: wp: add WP pagetable tracking to x86
Date: Fri, 25 Jan 2019 11:30:44 +0800	[thread overview]
Message-ID: <20190125033044.GP18231@xz-x1> (raw)
In-Reply-To: <20190124154050.GC5030@redhat.com>

On Thu, Jan 24, 2019 at 10:40:50AM -0500, Jerome Glisse wrote:
> On Thu, Jan 24, 2019 at 01:16:16PM +0800, Peter Xu wrote:
> > On Mon, Jan 21, 2019 at 10:09:38AM -0500, Jerome Glisse wrote:
> > > On Mon, Jan 21, 2019 at 03:57:08PM +0800, Peter Xu wrote:
> > > > From: Andrea Arcangeli <aarcange@redhat.com>
> > > > 
> > > > Accurate userfaultfd WP tracking is possible by tracking exactly which
> > > > virtual memory ranges were writeprotected by userland. We can't relay
> > > > only on the RW bit of the mapped pagetable because that information is
> > > > destroyed by fork() or KSM or swap. If we were to relay on that, we'd
> > > > need to stay on the safe side and generate false positive wp faults
> > > > for every swapped out page.
> > 
> > (I'm trying to leave comments with my own understanding here; they
> >  might not be the original purposes when Andrea proposed the idea.
> >  Andrea, please feel free to chim in anytime especially if I am
> >  wrong... :-)
> > 
> > > 
> > > So you want to forward write fault (of a protected range) to user space
> > > only if page is not write protected because of fork(), KSM or swap.
> > > 
> > > This write protection feature is only for anonymous page right ? Other-
> > > wise how would you protect a share page (ie anyone can look it up and
> > > call page_mkwrite on it and start writting to it) ?
> > 
> > AFAIU we want to support shared memory too in the future.  One example
> > I can think of is current QEMU usage with DPDK: we have two processes
> > sharing the guest memory range.  So indeed this might not work if
> > there are unknown/malicious users of the shared memory, however in
> > many use cases the users are all known and AFAIU we should just write
> > protect all these users then we'll still get notified when any of them
> > write to a page.
> > 
> > > 
> > > So for anonymous page for fork() the mapcount will tell you if page is
> > > write protected for COW. For KSM it is easy check the page flag.
> > 
> > Yes I agree that KSM should be easy.  But for COW, please consider
> > when we write protect a page that was shared and RW removed due to
> > COW.  Then when we page fault on this page should we report to the
> > monitor?  IMHO we can't know if without a specific bit in the PTE.
> > 
> > > 
> > > For swap you can use the page lock to synchronize. A page that is
> > > write protected because of swap is write protected because it is being
> > > write to disk thus either under page lock, or with PageWriteback()
> > > returning true while write is on going.
> > 
> > For swap I think the major problem is when the page was swapped out of
> > main memory and then we write to the page (which was already a swap
> > entry now).  Then we'll first swap in the page into main memory again,
> > but then IMHO we will face the similar issue like COW above - we can't
> > judge whether this page was write protected by uffd-wp at all.  Of
> > course here we can detect the VMA flags and assuming it's write
> > protected if the UFFD_WP flag was set on the VMA flag, however we'll
> > also mark those pages which were not write protected at all hence
> > it'll generate false positives of write protection messages.  This
> > idea can apply too to above COW use case.  As a conclusion, in these
> > use cases we should not be able to identify explicitly on page
> > granularity write protection if without a specific _PAGE_UFFD_WP bit
> > in the PTE entries.
> 
> So i need to think a bit more on this, probably not right now
> but just so i get the chain of event properly:
>   1 - user space ioctl UFD to write protect a range
>   2 - UFD set a flag on the vma and update CPU page table

A trivial supplement to these two steps to be clear: the change to VMA
flags and PTE permissions are different steps.  Say, to write protect
a newly mmap()ed region, we need to do:

  (a) ioctl UFFDIO_REGISTER upon the range: this will properly attach
      the VM_UFFD_WP flag upon the VMA object, and...

  (b) ioctl UFFDIO_WRITEPROTECT upon the range again: this will
      properly apply the new uffd-wp bit and write protect the
      PTEs/PMDs.

Note that the range specified in step (b) could also be part of the
buffer, so it does not need to cover the whole VMA, and it's in page
granularity.

>   3 - page can be individualy write faulted and it sends a
>       signal to UFD listener and they handle the fault
>   4 - UFD kernel update the page table once userspace have
>       handled the fault and sent result to UFD. At this point
>       the vma still has the UFD write protect flag set.

Yes. As explained above, the VMA can have the VM_UFFD_WP flag even if
none of the PTEs underneath was write protected.

> 
> So at any point in time in a range you might have writeable
> pte that correspond to already handled UFD write fault. Now
> if COW,KSM or swap happens on those then on the next write
> fault you do not want to send a signal to userspace but handle
> the fault just as usual ?

Yes, if the PTE has already resolved the uffd write protection and
then it will be just like a normal PTE, because when resolving the
uffd-wp page fault we'll also remove the special uffd-wp bit on the
PTE/PMD.

And IMHO actually what's more special here is when we write protect a
shared private page that is for COW (I'll skip KSM since it looks very
like this case IIUC): here due to COW the PTE already lost the RW bit,
and here when we do uffd-wp upon this page we'll simply apply the
uffd-wp bit only to mark that this PTE was especially write protected
by userfaults.  And when we want to resolve the uffd-wp for such a PTE
we'll first try to do COW if it is shared by others by checking
against page_mapcount().

> 
> I believe this is the event flow, so i will ponder on this some
> more :)

Yes please. :) The workflow of the new ioctl()s was also mentioned in
the cover letter.  Please feel free to have a look too.

Thanks,

-- 
Peter Xu

  reply	other threads:[~2019-01-25  3:30 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-21  7:56 [PATCH RFC 00/24] userfaultfd: write protection support Peter Xu
2019-01-21  7:56 ` [PATCH RFC 01/24] mm: gup: rename "nonblocking" to "locked" where proper Peter Xu
2019-01-21 10:20   ` Mike Rapoport
2019-01-21  7:57 ` [PATCH RFC 02/24] mm: userfault: return VM_FAULT_RETRY on signals Peter Xu
2019-01-21 15:40   ` Jerome Glisse
2019-01-22  6:10     ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 03/24] mm: allow VM_FAULT_RETRY for multiple times Peter Xu
2019-01-21 15:55   ` Jerome Glisse
2019-01-22  8:22     ` Peter Xu
2019-01-22 16:53       ` Jerome Glisse
2019-01-23  2:12         ` Peter Xu
2019-01-23  2:39           ` Jerome Glisse
2019-01-24  5:45             ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 04/24] mm: gup: " Peter Xu
2019-01-21 16:24   ` Jerome Glisse
2019-01-24  7:05     ` Peter Xu
2019-01-24 15:34       ` Jerome Glisse
2019-01-25  2:49         ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 05/24] userfaultfd: wp: add helper for writeprotect check Peter Xu
2019-01-21 10:23   ` Mike Rapoport
2019-01-22  8:31     ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 06/24] userfaultfd: wp: support write protection for userfault vma range Peter Xu
2019-01-21 10:20   ` Mike Rapoport
2019-01-22  8:55     ` Peter Xu
2019-01-21 14:05   ` Jerome Glisse
2019-01-22  9:39     ` Peter Xu
2019-01-22 17:02       ` Jerome Glisse
2019-01-23  2:17         ` Peter Xu
2019-01-23  2:43           ` Jerome Glisse
2019-01-24  5:47             ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 07/24] userfaultfd: wp: add the writeprotect API to userfaultfd ioctl Peter Xu
2019-01-21 10:42   ` Mike Rapoport
2019-01-24  4:56     ` Peter Xu
2019-01-24  7:27       ` Mike Rapoport
2019-01-24  9:28         ` Peter Xu
2019-01-25  7:54           ` Mike Rapoport
2019-01-25 10:12             ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 08/24] userfaultfd: wp: hook userfault handler to write protection fault Peter Xu
2019-01-21  7:57 ` [PATCH RFC 09/24] userfaultfd: wp: enabled write protection in userfaultfd API Peter Xu
2019-01-21  7:57 ` [PATCH RFC 10/24] userfaultfd: wp: add WP pagetable tracking to x86 Peter Xu
2019-01-21 15:09   ` Jerome Glisse
2019-01-24  5:16     ` Peter Xu
2019-01-24 15:40       ` Jerome Glisse
2019-01-25  3:30         ` Peter Xu [this message]
2019-01-21  7:57 ` [PATCH RFC 11/24] userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers Peter Xu
2019-01-21  7:57 ` [PATCH RFC 12/24] userfaultfd: wp: add UFFDIO_COPY_MODE_WP Peter Xu
2019-01-21  7:57 ` [PATCH RFC 13/24] mm: merge parameters for change_protection() Peter Xu
2019-01-21 13:54   ` Jerome Glisse
2019-01-24  5:22     ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 14/24] userfaultfd: wp: apply _PAGE_UFFD_WP bit Peter Xu
2019-01-21  7:57 ` [PATCH RFC 15/24] mm: export wp_page_copy() Peter Xu
2019-01-21  7:57 ` [PATCH RFC 16/24] userfaultfd: wp: handle COW properly for uffd-wp Peter Xu
2019-01-21  7:57 ` [PATCH RFC 17/24] userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork Peter Xu
2019-01-21  7:57 ` [PATCH RFC 18/24] userfaultfd: wp: add pmd_swp_*uffd_wp() helpers Peter Xu
2019-01-21  7:57 ` [PATCH RFC 19/24] userfaultfd: wp: support swap and page migration Peter Xu
2019-01-21  7:57 ` [PATCH RFC 20/24] userfaultfd: wp: don't wake up when doing write protect Peter Xu
2019-01-21 11:10   ` Mike Rapoport
2019-01-24  5:36     ` Peter Xu
2019-01-21  7:57 ` [PATCH RFC 21/24] khugepaged: skip collapse if uffd-wp detected Peter Xu
2019-01-21  7:57 ` [PATCH RFC 22/24] userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update Peter Xu
2019-01-21  7:57 ` [PATCH RFC 23/24] userfaultfd: selftests: refactor statistics Peter Xu
2019-01-21  7:57 ` [PATCH RFC 24/24] userfaultfd: selftests: add write-protect test Peter Xu
2019-01-21 14:33 ` [PATCH RFC 00/24] userfaultfd: write protection support David Hildenbrand
2019-01-22  3:18   ` Peter Xu
2019-01-22  8:59     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190125033044.GP18231@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=cracauer@cons.org \
    --cc=dgilbert@redhat.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=gokhale2@llnl.gov \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=mgorman@suse.de \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=shli@fb.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.