All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kai Huang <kai.huang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Cc: keir@xen.org, kevin.tian@intel.com, tim@xen.org, xen-devel@lists.xen.org
Subject: Re: PML (Page Modification Logging) design for Xen
Date: Thu, 12 Feb 2015 10:35:29 +0800	[thread overview]
Message-ID: <54DC1171.1030000@linux.intel.com> (raw)
In-Reply-To: <54DB6392020000780005F08B@mail.emea.novell.com>


On 02/11/2015 09:13 PM, Jan Beulich wrote:
>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>> On 11/02/15 08:28, Kai Huang wrote:
>>> With PML, we don't have to use write protection but just clear D-bit
>>> of EPT entry of guest memory to do dirty logging, with an additional
>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>> benchmarks in guest will have better performance comparing to non-PML.
>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
>> Here, hardware will give us a list of affected gfns, but how is Xen
>> supposed to efficiently clear the dirty bits again?  Using EPT
>> misconfiguration is no better than the existing fault path.
> Why not? The misconfiguration exit ought to clear the D bit for all
> 511 entries in the L1 table (and set it for the one entry that is
> currently serving the access). All further D bit handling will then
> be PML based.
Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the 
sequences are as follows:

1) PML enabled for the domain.
2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
3) Guest accesses specific GPA (which has been invalidated by step 2), 
and EPT misconfig is triggered.
4) Then resolve_misconfig is called, which fixes up GFN (above GPA >> 
12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we 
clear D-bit of EPT entry (instead of clear W-bit) if p2m type is 
p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.

The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.

>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is
>>> PML buffer full VMEXIT handler (apparently), and the second place is
>>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace
>>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>>> PML buffers before reporting dirty GPAs to userspace.
>> Why apparently?  It would be quite easy for a guest to dirty 512 frames
>> without otherwise taking a vmexit.
> I silently replaced apparently with obviously while reading...
>
>>> We handle above two cases by flushing PML buffer at the beginning of
>>> all VMEXITs. This solves the first case above, and it also solves the
>>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>>> which kicks vcpus (that are in guest mode) out of guest mode via
>>> sending IPI, which cause VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> My gut feeling is that this is substantial overhead on a common path,
>> but this largely depends on how the dirty bits can be cleared efficiently.
> I agree on the overhead part, but I don't see what relation this has
> to the dirty bit clearing - a PML buffer flush doesn't involve any
> alterations of D bits.
No the flush is not related to the dirty bit clearing. The PML buffer 
flush just does following (which I should have clarified in my design, 
sorry):
1) read out PML index
2) Loop all GPAs logged in the PML buffer according to PML index, and 
update them to log-dirty radix tree.

I agree there's overhead on VMEXIT common path, but the overhead should 
not be substantial, comparing to the overhead of VMEXIT itself.

Thanks,
-Kai
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

  parent reply	other threads:[~2015-02-12  2:35 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-11  8:28 PML (Page Modification Logging) design for Xen Kai Huang
2015-02-11 11:52 ` Andrew Cooper
2015-02-11 13:13   ` Jan Beulich
2015-02-11 16:33     ` Andrew Cooper
2015-02-11 16:55       ` Jan Beulich
2015-02-12  2:35     ` Kai Huang [this message]
2015-02-12  6:25       ` Tian, Kevin
2015-02-12  6:45         ` Kai Huang
2015-02-12  7:08           ` Tian, Kevin
2015-02-12  7:34             ` Kai Huang
2015-02-12 12:42             ` Tim Deegan
2015-02-13  2:15               ` Kai Huang
2015-02-13  2:28               ` Tian, Kevin
2015-02-17 10:40                 ` Jan Beulich
2015-02-12  2:39   ` Kai Huang
2015-02-12  6:54     ` Tian, Kevin
2015-02-12  6:56       ` Kai Huang
2015-02-12  7:09         ` Tian, Kevin
2015-02-12  7:15           ` Kai Huang
2015-02-12 14:10       ` Andrew Cooper
2015-02-13  0:58         ` Bing
2015-02-13  2:11         ` Kai Huang
2015-02-13 10:57           ` Andrew Cooper
2015-02-13 14:32             ` Kai Huang
2015-02-13 15:28               ` Andrew Cooper
2015-02-13 15:52                 ` Kai Huang
2015-02-14  3:01                   ` Kai Huang
2015-02-16 11:44                     ` Andrew Cooper
2015-02-16 14:02                       ` Kai Huang
2015-02-17 10:37                       ` Jan Beulich
2015-02-17 10:19     ` Jan Beulich
2015-02-17 11:57       ` Tim Deegan
2015-03-11 10:59       ` George Dunlap
2015-03-11 11:11         ` Andrew Cooper
2015-03-11 15:53           ` George Dunlap
2015-03-12  7:36             ` Kai Huang
2015-03-12 11:19               ` Andrew Cooper
2015-03-14  3:04                 ` Kai Huang
2015-03-24  6:42       ` Kai Huang
2015-03-24  7:53         ` Jan Beulich
2015-03-24  8:06           ` Kai Huang
2015-03-24  8:14             ` Jan Beulich
2015-03-24  8:17               ` Kai Huang
2015-02-11 13:06 ` Jan Beulich
2015-02-12  2:49   ` Kai Huang
2015-02-12  5:16     ` Kai Huang
2015-02-12  7:02     ` Tian, Kevin
2015-02-12  7:04       ` Kai Huang
2015-02-17 10:23     ` Jan Beulich
2015-03-01 23:13       ` Kai Huang
2015-02-12 12:34 ` Tim Deegan
2015-02-13  2:50   ` Kai Huang
2015-02-16 14:01     ` Kai Huang
2015-02-16 18:19       ` Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DC1171.1030000@linux.intel.com \
    --to=kai.huang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.