Re: [PATCH v2 1/6] x86/EPT: don't walk entire page tables when globally changing types

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Tim Deegan <tim@xen.org>
To: Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	suravee.suthikulpanit@amd.com, Eddie Dong <eddie.dong@intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH v2 1/6] x86/EPT: don't walk entire page tables when globally changing types
Date: Thu, 24 Apr 2014 17:41:03 +0200	[thread overview]
Message-ID: <20140424154103.GH48969@deinos.phlegethon.org> (raw)
In-Reply-To: <53567C85020000780000ABA3@nat28.tlf.novell.com>

At 13:28 +0100 on 22 Apr (1398169701), Jan Beulich wrote:
> Instead leverage the EPT_MISCONFIG VM exit by marking just the top
> level entries as needing recalculation of their type, propagating the
> the recalculation state down as necessary such that the actual
> recalculation gets done upon access.
> 
> For this to work, we have to
> - restrict the types between which conversions can be done (right now
>   only the two types involved in log dirty tracking need to be taken
>   care of)
> - remember the ranges that log dirty tracking was requested for as well
>   as whether global log dirty tracking is in effect
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Generally looks good; I think the core idea of reusing the misconfig
mechanism is the right way to go. 

The naming you've chosen implies that it might be extended later to
other lazily-updated types; I'm not sure how well that will scale.
OTOH right now I can't think of any other users for this (and I can 
see why you're not happy with using change_type() to flush out
foreign mappings on teardown after this changes).

A few comments on detail below:

> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -110,11 +110,18 @@ int hap_track_dirty_vram(struct domain *
>          if ( begin_pfn != dirty_vram->begin_pfn ||
>               begin_pfn + nr != dirty_vram->end_pfn )
>          {
> +            unsigned long ostart = dirty_vram->begin_pfn;
> +            unsigned long oend = dirty_vram->end_pfn;
> +
>              dirty_vram->begin_pfn = begin_pfn;
>              dirty_vram->end_pfn = begin_pfn + nr;
>  
>              paging_unlock(d);
>  
> +            if ( oend > ostart )
> +                p2m_change_type_range(d, ostart, oend,
> +                                      p2m_ram_logdirty, p2m_ram_rw);
> +

Does this (and the simlar change below) not risk losing updates if
we're in full log-dirty mode?  I think it should be fine to leave
those entries as log-dirty, since at worst they'll provoke another
fault-and-fixup.

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -116,8 +116,14 @@ static int p2m_init_hostp2m(struct domai
>  
>      if ( p2m )
>      {
> -        d->arch.p2m = p2m;
> -        return 0;
> +        p2m->logdirty_ranges = rangeset_new(d, "log-dirty",
> +                                            RANGESETF_prettyprint_hex);

Is there anything the guest can do to cause this rangeset to grow very
large?  Like moving its video RAM around?

> +/*
> + * Resolve deliberately mis-configured (EMT field set to an invalid value)
> + * entries in the page table hierarchy for the given GFN:
> + * - calculate the correct value for the EMT field
> + * - if marked so, re-calculate the P2M type
> + * - propagate EMT and re-calculation flag down to the next page table level
> + *   for entries not involved in the translation of the given GFN
> + */
> +static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>  {

Please document the different return values here as well.

>  
> -    return !!okay;
> +    return rc >= !spurious;

Please just write out this logic in a human-readable way.  The compiler
will DTRT. :)  

>  }
>  
>  /*
> @@ -416,12 +470,11 @@ ept_set_entry(struct p2m_domain *p2m, un
>      unsigned long gfn_remainder = gfn;
>      int i, target = order / EPT_TABLE_ORDER;
>      int rc = 0;
> -    int ret = 0;
>      bool_t direct_mmio = (p2mt == p2m_mmio_direct);
>      uint8_t ipat = 0;
>      int need_modify_vtd_table = 1;
>      int vtd_pte_present = 0;
> -    int needs_sync = 1;
> +    int ret, needs_sync = -1;

Hmmm, another tristate.  I find these hard to follow, esp.  without
comments to say which non-zero value is which.  I wonder if we could
add some sort of framework for these to make them clearer.  Ideally
we'd have an enum-like type and rely on the compiler to DTRT about
optimizing the tests.

In this case, where needs_sync isn't being passed as a return value, I
think we'd be fine with two booleans in place of this int.

And more generally, I would like to see at least a comment on every
new instance of this trick before I ack it in x86/mm code.

Cheers,

Tim.

next prev parent reply	other threads:[~2014-04-24 15:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-22 12:22 [PATCH v2 0/6] x86/P2M: reduce time bulk type changes take Jan Beulich
2014-04-22 12:28 ` [PATCH v2 1/6] x86/EPT: don't walk entire page tables when globally changing types Jan Beulich
2014-04-24 15:41   ` Tim Deegan [this message]
2014-04-24 16:20     ` Jan Beulich
2014-04-25  8:44       ` Tim Deegan
2014-04-25 19:37   ` Konrad Rzeszutek Wilk
2014-04-28  7:19     ` Jan Beulich
2014-04-22 12:29 ` [PATCH v2 2/6] x86/EPT: don't walk entire page tables when changing types on a range Jan Beulich
2014-04-24 16:00   ` Tim Deegan
2014-04-22 12:30 ` [PATCH v2 3/6] x86/P2M: simplify write_p2m_entry() Jan Beulich
2014-04-24 16:00   ` Tim Deegan
2014-04-24 16:22     ` Jan Beulich
2014-04-22 12:30 ` [PATCH v2 4/6] x86/NPT: don't walk entire page tables when changing types on a range Jan Beulich
2014-04-24 16:25   ` Tim Deegan
2014-04-25  6:13     ` Jan Beulich
2014-04-25  8:43       ` Tim Deegan
2014-04-22 12:31 ` [PATCH v2 5/6] x86/NPT: don't walk entire page tables when globally changing types Jan Beulich
2014-04-24 16:28   ` Tim Deegan
2014-04-22 12:32 ` [PATCH v2 6/6] x86/P2M: cleanup Jan Beulich
2014-04-22 12:37   ` Andrew Cooper
2014-04-24 16:30   ` Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140424154103.GH48969@deinos.phlegethon.org \
    --to=tim@xen.org \
    --cc=JBeulich@suse.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=eddie.dong@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).