From: Andrea Arcangeli <aarcange@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Avi Kivity <avi@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host
Date: Wed, 22 Aug 2012 18:29:55 +0200 [thread overview]
Message-ID: <20120822162955.GT29978@redhat.com> (raw)
In-Reply-To: <5034763D.60508@linux.vnet.ibm.com>
On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote:
> On 08/21/2012 11:06 PM, Andrea Arcangeli wrote:
> > CPU0 CPU1
> > oldpage[1] == 0 (both guest & host)
> > oldpage[0] = 1
> > trigger do_wp_page
>
> We always do ptep_clear_flush before set_pte_at_notify(),
> at this point, we have done:
> pte = 0 and flush all tlbs
> > mmu_notifier_change_pte
> > spte = newpage + writable
> > guest does newpage[1] = 1
> > vmexit
> > host read oldpage[1] == 0
>
> It can not happen, at this point pte = 0, host can not
> access oldpage anymore, host read can generate #PF, it
> will be blocked on page table lock until CPU 0 release the lock.
Agreed, this is why your fix is safe.
So the thing is, it is never safe to mangle the secondary MMU before
the primary MMU. This is why your patch triggered all sort of alarm
bells to me and I was tempted to suggest an obviously safe
alternative.
The reason why your patch is safe, is that the required primary MMU
pte mangling happens before the set_pte_at_notify is invoked.
Other details about change_pte:
1) it is only safe to use on an already readonly pte if the pfn is
being altered
2) it is only safe to run on a read-write mapping to convert it to a
readonly mapping if the pfn doesn't change
KSM uses it as 2) in page_write_protect.
KSM uses it as 1) in replace_page and do_wp_page uses it as 1) too.
The new constraint for its safety after your fix is that it must
always be preceded by a ptep_clear_flush.
Of course it's quite natural that it is preceded by a
ptep_clear_flush, other things would risk to go wrong if
ptep_clear_flush wasn't done, so there's little risk of getting it
wrong.
I thought, maybe it would be clearer to do it as
ptep_clear_flush_notify_at(pte). That would avoid having methods that
rings the alarm bells. But the O_DIRECT check of KSM in
page_write_protect prevents such a change (there we need to run a
standalone ptep_clear_flush).
I suggest only adding a comment to mention the real primary MMU pte
update happens before set_pte_at_notify is invoked and we're not
really doing secondary MMU updates before primary MMU updates which
wouldn't never be safe.
It never would be safe because the secondary MMU can be just a TLB and
even the KSM sptes can be dropped at any time and refilled through
secondary MMU page faults running gup_fast. The PT lock won't stop it.
Thanks a lot for fixing this subtle race!
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-08-22 16:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-21 9:46 [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host Xiao Guangrong
2012-08-21 15:06 ` Andrea Arcangeli
2012-08-22 3:51 ` Xiao Guangrong
2012-08-22 4:12 ` Hugh Dickins
2012-08-22 5:37 ` Xiao Guangrong
2012-08-22 16:37 ` Andrea Arcangeli
2012-08-23 7:25 ` Xiao Guangrong
2012-08-22 6:03 ` Xiao Guangrong
2012-08-22 16:29 ` Andrea Arcangeli [this message]
2012-08-22 19:15 ` Andrew Morton
2012-08-22 19:50 ` Andrea Arcangeli
2012-08-22 19:58 ` Andrew Morton
2012-08-22 20:14 ` Andrea Arcangeli
2012-08-23 7:34 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120822162955.GT29978@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).