From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kai Huang <kai.huang@linux.intel.com>
Subject: Re: PML (Page Modification Logging) design for Xen
Date: Thu, 12 Feb 2015 10:35:29 +0800
Message-ID: <54DC1171.1030000@linux.intel.com>
References: <54DB129D.3060102@linux.intel.com> <54DB4294.1080406@citrix.com>
	<54DB6392020000780005F08B@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <54DB6392020000780005F08B@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>
Cc: keir@xen.org, kevin.tian@intel.com, tim@xen.org, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org


On 02/11/2015 09:13 PM, Jan Beulich wrote:
>>>> On 11.02.15 at 12:52, <andrew.cooper3@citrix.com> wrote:
>> On 11/02/15 08:28, Kai Huang wrote:
>>> With PML, we don't have to use write protection but just clear D-bit
>>> of EPT entry of guest memory to do dirty logging, with an additional
>>> PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can
>>> reduce hypervisor overhead when guest is in dirty logging mode, and
>>> therefore more CPU cycles can be allocated to guest, so it's expected
>>> benchmarks in guest will have better performance comparing to non-PML.
>> One issue with basic EPT A/D tracking was the scan of the EPT tables.
>> Here, hardware will give us a list of affected gfns, but how is Xen
>> supposed to efficiently clear the dirty bits again?  Using EPT
>> misconfiguration is no better than the existing fault path.
> Why not? The misconfiguration exit ought to clear the D bit for all
> 511 entries in the L1 table (and set it for the one entry that is
> currently serving the access). All further D bit handling will then
> be PML based.
Indeed, we clear D-bit in EPT misconfiguration. In my understanding, the 
sequences are as follows:

1) PML enabled for the domain.
2) ept_invalidate_emt (or ept_invalidate_emt_range) is called.
3) Guest accesses specific GPA (which has been invalidated by step 2), 
and EPT misconfig is triggered.
4) Then resolve_misconfig is called, which fixes up GFN (above GPA >> 
12) to p2m_ram_logdirty, and calls ept_p2m_type_to_flags, in which we 
clear D-bit of EPT entry (instead of clear W-bit) if p2m type is 
p2m_ram_logdirty. Then dirty logging of this GFN will be handled by PML.

The above 2) ~ 4) will be repeated when log-dirty radix tree is cleared.

>
>>> - PML buffer flush
>>>
>>> There are two places we need to flush PML buffer. The first place is
>>> PML buffer full VMEXIT handler (apparently), and the second place is
>>> in paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace
>>> via hypercall, and it's possible there are dirty GPAs logged in vcpus'
>>> PML buffers but not full. Therefore we'd better to flush all vcpus'
>>> PML buffers before reporting dirty GPAs to userspace.
>> Why apparently?  It would be quite easy for a guest to dirty 512 frames
>> without otherwise taking a vmexit.
> I silently replaced apparently with obviously while reading...
>
>>> We handle above two cases by flushing PML buffer at the beginning of
>>> all VMEXITs. This solves the first case above, and it also solves the
>>> second case, as prior to paging_log_dirty_op, domain_pause is called,
>>> which kicks vcpus (that are in guest mode) out of guest mode via
>>> sending IPI, which cause VMEXIT, to them.
>>>
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>> My gut feeling is that this is substantial overhead on a common path,
>> but this largely depends on how the dirty bits can be cleared efficiently.
> I agree on the overhead part, but I don't see what relation this has
> to the dirty bit clearing - a PML buffer flush doesn't involve any
> alterations of D bits.
No the flush is not related to the dirty bit clearing. The PML buffer 
flush just does following (which I should have clarified in my design, 
sorry):
1) read out PML index
2) Loop all GPAs logged in the PML buffer according to PML index, and 
update them to log-dirty radix tree.

I agree there's overhead on VMEXIT common path, but the overhead should 
not be substantial, comparing to the overhead of VMEXIT itself.

Thanks,
-Kai
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel