Re: [PATCH] turn off writable page tables

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zachary Amsden <zach@vmware.com>
To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: virtualization@lists.osdl.org,
	Andrew Theurer <habanero@us.ibm.com>,
	xen-devel@lists.xensource.com, Andi Kleen <ak@suse.de>
Subject: Re: [PATCH] turn off writable page tables
Date: Mon, 31 Jul 2006 12:56:05 -0700	[thread overview]
Message-ID: <44CE6055.3000902@vmware.com> (raw)
In-Reply-To: <07059add50c2b7826e967218da89c140@cl.cam.ac.uk>

Keir Fraser wrote:
>
> On 31 Jul 2006, at 10:32, Zachary Amsden wrote:
>
>>> It would allow set_pte() to switch between explicit queuing and 
>>> 'direct' writing. We moved away from the former a few years back as 
>>> doing it everywhere made a mess of the generic Linux mm code and it 
>>> was hard to reason whether our patches were correct. I guess doing 
>>> it for the most important subset of mm routines is not so bad. It's 
>>> a shame that, although many set_pte() call sites could determine 
>>> statically whether or not they will batch, we'd end up with a 
>>> dynamic run-time test everywhere (unless I'm mistaken) -- I wonder 
>>> if that has a measurable cost?
>>>
>>
>> We've actually seen a benefit for this, despite the cost of the 
>> non-static parameters, for both VMI Linux with shadow pagetables on 
>> ESX and VMI Linux with direct pagetables on Xen.  Turns out that as 
>> long as the call EIP is predictable, the parameters do not 
>> necessarily need to be so, and modern processors are getting much 
>> better at branch prediction.
>
> You mean that the benefit of batching outweighs the cost of an extra 
> test-and-branch in the middle of a loop, or that the extra 
> test-and-branch simply has unmeasurable overhead? The former is to be 
> expected, but I'd be worried about other call sites where batching 
> does not happen, and an effect on those.

The extra test-and-branch has unmeasurable overhead.  In the 
implementation we had chosen, there was already a branch requirement on 
the set_pte call anyway, to potentially delay the pte update so that it 
can piggyback onto a page invalidation with just one hypercall.  
Combining the two branches into one is trivial, and the cost of one 
extra branch here seems to be invisible.  We were getting better numbers 
for MMU related workloads with VMI-Linux than XenoLinux was.  I don't 
have hard numbers on this, and even if I did, it would take some time to 
get them approved for public distribution.  For that I must apologize.  
But avoiding the changes that would otherwise be required - a full set 
of pte and tlb functions which could be delayed, as well as combining 
the pte update and invlpg into a single call - seemed worth a single 
branch.  I'm not even convinced these changes can be done in a way that 
would be safe for all architectures.  Of course, I may be wrong on that 
point - but there is no simple way I see to do it that affords the 
strong reasoning about correctness that the enter / leave semantic does.

>
>> Doing explicit batching exactly where it counts, under protection of 
>> locks, so that SMP safety is guaranteed turns out to be really easy, 
>> as well as a nice win.
>
> If the run-time check cost really isn't an issue (I'd like to see 
> numbers), we'd likely use this new interface in preference to 
> implicitly batched writable pagetables and would support its inclusion 
> in the kernel.

Sorry about not having numbers.  My biggest question is - do you need 
any other information than simply a single state variable to use 
explicit batching?  I thought, and Jeremy and Chris both pointed out as 
well, that Xen could potentially use the information about which PT to 
unhook to take advantage of writable pagetables.  But, if that is not 
the direction you are going, then it seems this information is not so 
relevant for the explicit batching.  The explicit batching does have one 
disadvantage without writable page tables, which is a potential long 
term maintenance / correctness issue - you must remove read hazards from 
these encapsulated paths.  That is not so hard to do, and not a large 
general problem, because the batching is explicit rather than implicit, 
so you can pick paths to batch that are small, compact, and easy to 
reason about.  But nevertheless, a point I would like to make sure you 
are comfortable with before we all decide these hooks will work for 
everyone.

Zach

next prev parent reply	other threads:[~2006-07-31 19:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-28 15:51 [PATCH] turn off writable page tables Ian Pratt
2006-07-28 16:31 ` Keir Fraser
2006-07-28 21:36   ` Zachary Amsden
2006-07-28 23:05     ` Andi Kleen
2006-07-28 23:10       ` Zachary Amsden
2006-07-31  9:14         ` Keir Fraser
2006-07-31  9:32           ` Zachary Amsden
2006-07-31  9:53             ` Keir Fraser
2006-07-31 19:56               ` Zachary Amsden [this message]
2006-07-31 22:07                 ` Keir Fraser
2006-07-31 22:40                   ` Zachary Amsden
2006-08-02  9:21                     ` Keir Fraser
2006-08-03 20:42                       ` Mike D. Day
2006-08-09 21:15                         ` Andrew Theurer
  -- strict thread matches above, loose matches on Subject: below --
2006-07-27 17:31 [PATCH] " Ian Pratt
2006-07-28  8:55 ` Keir Fraser
2006-07-28 15:21   ` Andrew Theurer
     [not found] <E1G5sBV-0005eg-At@host-192-168-0-1-bcn-london>
2006-07-26 23:38 ` Joe Bonasera
2006-07-26 21:38 Ian Pratt
2006-07-27 14:43 ` Andrew Theurer
2006-07-27 15:30   ` Keir Fraser
2006-07-25 22:41 Ian Pratt
2006-07-26  2:25 ` Andrew Theurer
2006-07-26  5:31   ` Jacob Gorm Hansen
2006-07-26  8:18 ` Gerd Hoffmann
2006-07-26  8:40   ` Keir Fraser
2006-07-26 21:10     ` Andrew Theurer
2006-07-25 22:14 Andrew Theurer
2006-07-25 22:43 ` Nivedita Singhvi
2006-07-25 23:19   ` Andrew Theurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44CE6055.3000902@vmware.com \
    --to=zach@vmware.com \
    --cc=Keir.Fraser@cl.cam.ac.uk \
    --cc=ak@suse.de \
    --cc=habanero@us.ibm.com \
    --cc=virtualization@lists.osdl.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.