From: Jerome Glisse <jglisse@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Peter Xu" <peterx@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Arnaldo Carvalho de Melo" <acme@kernel.org>,
"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
"Jiri Olsa" <jolsa@redhat.com>,
"Namhyung Kim" <namhyung@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Matthew Wilcox" <mawilcox@microsoft.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Michal Hocko" <mhocko@kernel.org>,
kvm@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory
Date: Mon, 11 Feb 2019 14:09:31 -0500 [thread overview]
Message-ID: <20190211190931.GA3908@redhat.com> (raw)
In-Reply-To: <20190201235738.GA12463@redhat.com>
On Fri, Feb 01, 2019 at 06:57:38PM -0500, Andrea Arcangeli wrote:
> Hello everyone,
>
> On Thu, Jan 31, 2019 at 01:37:02PM -0500, Jerome Glisse wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> >
> > This patchset is on top of my patchset to add context information to
> > mmu notifier [1] you can find a branch with everything [2]. I have not
> > tested it but i wanted to get the discussion started. I believe it is
> > correct but i am not sure what kind of kvm test i can run to exercise
> > this.
> >
> > The idea is that since kvm will invalidate the secondary MMUs within
> > invalidate_range callback then the change_pte() optimization is lost.
> > With this patchset everytime core mm is using set_pte_at_notify() and
> > thus change_pte() get calls then we can ignore the invalidate_range
> > callback altogether and only rely on change_pte callback.
> >
> > Note that this is only valid when either going from a read and write
> > pte to a read only pte with same pfn, or from a read only pte to a
> > read and write pte with different pfn. The other side of the story
> > is that the primary mmu pte is clear with ptep_clear_flush_notify
> > before the call to change_pte.
>
> If it's cleared with ptep_clear_flush_notify, change_pte still won't
> work. The above text needs updating with
> "ptep_clear_flush". set_pte_at_notify is all about having
> ptep_clear_flush only before it or it's the same as having a range
> invalidate preceding it.
>
> With regard to the code, wp_page_copy() needs
> s/ptep_clear_flush_notify/ptep_clear_flush/ before set_pte_at_notify.
>
> change_pte relies on the ptep_clear_flush preceding the
> set_pte_at_notify that will make sure if the secondary MMU mapping
> randomly disappears between ptep_clear_flush and set_pte_at_notify,
> gup_fast will wait and block on the PT lock until after
> set_pte_at_notify is completed before trying to re-establish a
> secondary MMU mapping.
>
> So then we've only to worry about what happens because we left the
> secondary MMU mapping potentially intact despite we flushed the
> primary MMU mapping with ptep_clear_flush (as opposed to
> ptep_clear_flush_notify which would teardown the secondary MMU mapping
> too).
So all the above is moot since as you pointed out in the other email
ptep_clear_flush_notify does not invalidate kvm secondary mmu hence.
>
> In you wording above at least the "with a different pfn" is
> superflous. I think it's ok if the protection changes from read-only
> to read-write and the pfn remains the same. Like when we takeover a
> page because it's not shared anymore (fork child quit).
>
> It's also ok to change pfn if the mapping is read-only and remains
> read-only, this is what KSM does in replace_page.
Yes i thought this was obvious i will reword and probably just do a
list of every case that is fine.
>
> The read-write to read-only case must not change pfn to avoid losing
> coherency from the secondary MMU point of view. This isn't so much
> about change_pte itself, but the fact that the page-copy generally
> happens well before the pte mangling starts. This case never presents
> itself in the code because KSM is first write protecting the page and
> only later merging it, regardless of change_pte or not.
>
> The important thing is that the secondary MMU must be updated first
> (unlike the invalidates) to be sure the secondary MMU already points
> to the new page when the pfn changes and the protection changes from
> read-only to read-write (COW fault). The primary MMU cannot read/write
> to the page anyway while we update the secondary MMU because we did
> ptep_clear_flush() before calling set_pte_at_notify(). So this
> ordering of "ptep_clear_flush; change_pte; set_pte_at" ensures
> whenever the CPU can access the memory, the access is synchronous
> with the secondary MMUs because they've all been updated already.
>
> If (in set_pte_at_notify) we were to call change_pte() after
> set_pte_at() what would happen is that the CPU could write to the page
> through a TLB fill without page fault while the secondary MMUs still
> read the old memory in the old readonly page.
Yeah, between do you have any good workload for me to test this ? I
was thinking of running few same VM and having KSM work on them. Is
there some way to trigger KVM to fork ? As the other case is breaking
COW after fork.
Cheers,
Jérôme
next prev parent reply other threads:[~2019-02-11 19:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-31 18:37 [RFC PATCH 0/4] Restore change_pte optimization to its former glory jglisse
2019-01-31 18:37 ` [RFC PATCH 1/4] uprobes: use set_pte_at() not set_pte_at_notify() jglisse
2019-02-02 0:50 ` Andrea Arcangeli
2019-02-11 19:27 ` Jerome Glisse
2019-01-31 18:37 ` [RFC PATCH 2/4] mm/mmu_notifier: use unsigned for event field in range struct jglisse
2019-02-02 1:13 ` Andrea Arcangeli
2019-01-31 18:37 ` [RFC PATCH 3/4] mm/mmu_notifier: set MMU_NOTIFIER_USE_CHANGE_PTE flag where appropriate jglisse
2019-01-31 18:37 ` [RFC PATCH 4/4] kvm/mmu_notifier: re-enable the change_pte() optimization jglisse
2019-02-01 23:57 ` [RFC PATCH 0/4] Restore change_pte optimization to its former glory Andrea Arcangeli
2019-02-02 0:14 ` Andrea Arcangeli
2019-02-11 19:09 ` Jerome Glisse [this message]
2019-02-11 20:02 ` Andrea Arcangeli
2019-02-18 16:04 ` Jerome Glisse
2019-02-18 17:45 ` Andrea Arcangeli
2019-02-18 18:20 ` Jerome Glisse
2019-02-19 2:37 ` Peter Xu
2019-02-19 2:43 ` Jerome Glisse
2019-02-19 3:33 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190211190931.GA3908@redhat.com \
--to=jglisse@redhat.com \
--cc=aarcange@redhat.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=jolsa@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.