xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function
Date: Fri, 15 Jun 2012 10:41:19 +0100	[thread overview]
Message-ID: <4FDB033F.8050703@citrix.com> (raw)
In-Reply-To: <1339582845-25659-2-git-send-email-david.vrabel@citrix.com>

On 13/06/12 11:20, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> x86 provides a ptep_get_and_clear() function instead of using the
> generic implementation so it can atomically get and clear the PTE.
> 
> It is not clear why x86 has this requirement.  PTE updates are done
> while the appropriate PTE lock are held, so presumably, it is an
> attempt to ensure that the accessed and dirty bits of the PTE are
> saved even if they have been updated by the hardware due to a
> concurrent access on another CPU.
> 
> However, the atomic exchange is not sufficient for saving the dirty
> bit as if there is a TLB entry for this page on another CPU then the
> dirty bit may be written by that processor after the PTE is cleared
> but before the TLB entry is flushed.  It is also not clear from the
> Intel SDM[1] if the processor's read of the PTE and the write to set
> the accessed bit are atomic or not.

This reasoning is probably not correct.  When a dirty bit must be
updated in a PTE the processor does a pagetable walk (possibly using any
cached page table structures).  The AMD APM section 5.4.2 states:

"The processor never sets the Accessed bit or the Dirty bit for a not
present page (P = 0)."

and

"If PTE[D] is cleared to 0, software can rely on the fact that the page
has not been written."

Thus this patch would /introduce/ a race where a dirty bit set would be
lost (rather than extending the window where this would happen).

However (and this is a weaker argument), no sensible userspace
application should be accessing pages that are being unmapped or
remapped (since it is unpredictable whether they will fault) so perhaps
this additional unpredictable behaviour is acceptable?

> With this patch the get/clear becomes a read of the PTE and call to
> pte_clear() which allows the clears to be batched by some
> paravirtualized guests (such as Xen guests).  This can lead to
> significant performance improvements in munmap().  e.g., for Xen, a
> trap-and-emulate[2] for every page becomes a hypercall for every 31
> pages.
> 
> There should be no change in the performance on native or for KVM
> guests.  There may be a performance regression with lguest guests as
> an optimization for avoiding calling pte_update() when doing a full
> teardown of an mm is removed.

Having looked further into how lguest works the performance penalty is
likely to not be as bad as I first thought.  pte_clear() does call
pte_update() but these are batched and the batch size seems to be up-to
64 calls so the performance penalty should be pretty minimal.

> [1] Intel Software Developer's Manual Vol 3A section 4.8 says nothing
> on this.
> 
> [2] For 32-bit guests, two traps are required for every page to update
> both halves of the PTE.

  reply	other threads:[~2012-06-15  9:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 10:20 [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions David Vrabel
2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
2012-06-15  9:41   ` David Vrabel [this message]
2012-06-15 10:49     ` [Xen-devel] " Keir Fraser
2012-06-13 10:20 ` [PATCH 2/2] x86/mm: remove arch-specific pmdp_get_and_clear() function David Vrabel
2012-06-13 14:04 ` [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions Konrad Rzeszutek Wilk
2012-06-13 15:00   ` David Vrabel
2012-06-14 18:29     ` Konrad Rzeszutek Wilk
2012-06-14 18:41       ` H. Peter Anvin
2012-06-18  9:13         ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FDB033F.8050703@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).