From: Kai Huang <kai.huang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: andrew.cooper3@citrix.com, kevin.tian@intel.com,
xen-devel@lists.xen.org, keir@xen.org, tim@xen.org
Subject: Re: PML (Page Modification Logging) design for Xen
Date: Thu, 12 Feb 2015 13:16:00 +0800 [thread overview]
Message-ID: <54DC3710.6070207@linux.intel.com> (raw)
In-Reply-To: <54DC14CB.40305@linux.intel.com>
On 02/12/2015 10:49 AM, Kai Huang wrote:
>
> On 02/11/2015 09:06 PM, Jan Beulich wrote:
>>>>> On 11.02.15 at 09:28, <kai.huang@linux.intel.com> wrote:
>>> - PML enable/disable for particular Domain
>>>
>>> PML needs to be enabled (allocate PML buffer, initialize PML index,
>>> PML base
>>> address, turn PML on VMCS, etc) for all vcpus of the domain, as PML
>>> buffer
>>> and PML index are per-vcpu, but EPT table may be shared by vcpus.
>>> Enabling
>>> PML on partial vcpus of the domain won't work. Also PML will only be
>>> enabled
>>> for the domain when it is switched to dirty logging mode, and it
>>> will be
>>> disabled when domain is switched back to normal mode. As looks vcpu
>>> number
>>> won't be changed dynamically during guest is running (correct me if
>>> I am
>>> wrong here), so we don't have to consider enabling PML for new
>>> created vcpu
>>> when guest is in dirty logging mode.
>>>
>>> After PML is enabled for the domain, we only need to clear EPT
>>> entry's D-bit
>>> for guest memory in dirty logging mode. We achieve this by checking
>>> if PML is
>>> enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and
>>> updating EPT entry accordingly. However, for super pages, we still
>>> write
>>> protect them in case of PML as we still need to split super page to
>>> 4K page
>>> in dirty logging mode.
>> While it doesn't matter much for our immediate needs, the
>> documentation isn't really clear about the behavior when a 2M or
>> 1G page gets its D bit set: Wouldn't it be rather useful to the
>> consumer to know of that fact (e.g. by setting some of the lower
>> bits of the PML entry to indicate so)?
> This is good point. The documentation only tells us the GPA will be
> logged with last 12 bits cleared. Whether hardware just clears last 12
> bits or performs 2M alignment (in case of 2M page) is not certain. I
> will confirm this with hardware guys. But as you said, it's not
> related to our immediate needs.
Forgot to say, to me currently it is certain that the lower 12 bits are
cleared as specification says GPA is written to log with 4K aligned. But
it should be possible to push hardware guys to modify if necessary,
though I am not 100% sure.
Thanks,
-Kai
>>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is
>>> PML
>>> buffer full VMEXIT handler (apparently), and the second place is in
>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from
>>> userspace via
>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>> buffers but not full. Therefore we'd better to flush all vcpus' PML
>>> buffers
>>> before reporting dirty GPAs to userspace.
>>>
>>> We handle above two cases by flushing PML buffer at the beginning of
>>> all
>>> VMEXITs. This solves the first case above, and it also solves the
>>> second
>>> case, as prior to paging_log_dirty_op, domain_pause is called, which
>>> kicks
>>> vcpus (that are in guest mode) out of guest mode via sending IPI,
>>> which cause
>>> VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed
>>> on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> Is that really efficient? Flushing the buffer only as needed doesn't
>> seem to be a major problem (apart from the usual preemption issue
>> when dealing with guests with very many vCPU-s, but you certainly
>> recall that at this point HVM is still limited to 128).
>>
>> Apart from these two remarks, the design looks okay to me.
> While keeping log-dirty radix tree more updated is probably
> irrelevant, I do think we'd better to flush PML buffers in
> paging_log_dirty_op (both peek and clear) before reporting dirty pages
> to userspace, in which case I think flushing PML buffer at beginning
> of VMEXIT is a good idea, as domain_pause does the job automatically.
> I am not sure how much cycles will flushing PML buffer contribute but
> I think it should be relatively small comparing to VMEXIT itself,
> therefore it can be ignored.
>
> An optimized way probably is we only flush PML buffer for external
> interrupt VMEXIT, which domain_pause really triggers, but not at
> beginning of all VMEXITs. But as log as the overhead of flush PML
> buffer is negligible, this optimization is also unnecessary.
>
> Thanks,
> -Kai
>>
>> Jan
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-02-12 5:16 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-11 8:28 PML (Page Modification Logging) design for Xen Kai Huang
2015-02-11 11:52 ` Andrew Cooper
2015-02-11 13:13 ` Jan Beulich
2015-02-11 16:33 ` Andrew Cooper
2015-02-11 16:55 ` Jan Beulich
2015-02-12 2:35 ` Kai Huang
2015-02-12 6:25 ` Tian, Kevin
2015-02-12 6:45 ` Kai Huang
2015-02-12 7:08 ` Tian, Kevin
2015-02-12 7:34 ` Kai Huang
2015-02-12 12:42 ` Tim Deegan
2015-02-13 2:15 ` Kai Huang
2015-02-13 2:28 ` Tian, Kevin
2015-02-17 10:40 ` Jan Beulich
2015-02-12 2:39 ` Kai Huang
2015-02-12 6:54 ` Tian, Kevin
2015-02-12 6:56 ` Kai Huang
2015-02-12 7:09 ` Tian, Kevin
2015-02-12 7:15 ` Kai Huang
2015-02-12 14:10 ` Andrew Cooper
2015-02-13 0:58 ` Bing
2015-02-13 2:11 ` Kai Huang
2015-02-13 10:57 ` Andrew Cooper
2015-02-13 14:32 ` Kai Huang
2015-02-13 15:28 ` Andrew Cooper
2015-02-13 15:52 ` Kai Huang
2015-02-14 3:01 ` Kai Huang
2015-02-16 11:44 ` Andrew Cooper
2015-02-16 14:02 ` Kai Huang
2015-02-17 10:37 ` Jan Beulich
2015-02-17 10:19 ` Jan Beulich
2015-02-17 11:57 ` Tim Deegan
2015-03-11 10:59 ` George Dunlap
2015-03-11 11:11 ` Andrew Cooper
2015-03-11 15:53 ` George Dunlap
2015-03-12 7:36 ` Kai Huang
2015-03-12 11:19 ` Andrew Cooper
2015-03-14 3:04 ` Kai Huang
2015-03-24 6:42 ` Kai Huang
2015-03-24 7:53 ` Jan Beulich
2015-03-24 8:06 ` Kai Huang
2015-03-24 8:14 ` Jan Beulich
2015-03-24 8:17 ` Kai Huang
2015-02-11 13:06 ` Jan Beulich
2015-02-12 2:49 ` Kai Huang
2015-02-12 5:16 ` Kai Huang [this message]
2015-02-12 7:02 ` Tian, Kevin
2015-02-12 7:04 ` Kai Huang
2015-02-17 10:23 ` Jan Beulich
2015-03-01 23:13 ` Kai Huang
2015-02-12 12:34 ` Tim Deegan
2015-02-13 2:50 ` Kai Huang
2015-02-16 14:01 ` Kai Huang
2015-02-16 18:19 ` Tim Deegan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DC3710.6070207@linux.intel.com \
--to=kai.huang@linux.intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.