From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kai Huang <kai.huang@linux.intel.com>
Subject: Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
Date: Thu, 05 Feb 2015 14:23:25 +0800
Message-ID: <54D30C5D.1000402@linux.intel.com>
References: <1422413668-3509-1-git-send-email-kai.huang@linux.intel.com> <1422413668-3509-7-git-send-email-kai.huang@linux.intel.com> <20150203151833.GD19731@potion.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: pbonzini@redhat.com, gleb@kernel.org, linux@arm.linux.org.uk,
	kvm@vger.kernel.org
To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga03.intel.com ([134.134.136.65]:53103 "EHLO mga03.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1756538AbbBEGbn (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 5 Feb 2015 01:31:43 -0500
In-Reply-To: <20150203151833.GD19731@potion.redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 02/03/2015 11:18 PM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote:
> 2015-01-28 10:54+0800, Kai Huang:
>> This patch adds PML support in VMX. A new module parameter 'enable_p=
ml' is added
> (+module_param_named(pml, enable_pml, bool, S_IRUGO);)
>
>> to allow user to enable/disable it manually.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>> index 587149b..a139977 100644
>> --- a/arch/x86/kvm/trace.h
>> +++ b/arch/x86/kvm/trace.h
>> @@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc,
>> +TRACE_EVENT(kvm_pml_full,
>> +	TP_PROTO(unsigned int vcpu_id),
>> +	TP_ARGS(vcpu_id),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned int,	vcpu_id			)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->vcpu_id		=3D vcpu_id;
>> +	),
>> +
>> +	TP_printk("vcpu %d: PML full", __entry->vcpu_id)
>> +);
>> +
>>   #endif /* CONFIG_X86_64 */
> You have it protected by CONFIG_X86_64, but use it unconditionally.
Thanks for catching. This has been fixed by another patch, and the fix=20
has also been merged by Paolo.

Thanks,
-Kai
>
> (Also, we can get all this information from kvm_exit tracepoint;
>   I'd just drop it.)
>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c987374..de5ce82 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -516,6 +519,10 @@ struct vcpu_vmx {
>> +	/* Support for PML */
>> +#define PML_ENTITY_NUM		512
> (Readers of this struct likely aren't interested in this number, so i=
t
>   would be nicer to define it outside.  I thought it is here to hint =
the
>   amount of allocated pages, but PML_ENTITY_NUM / 512 isn't obvious.)
>
>> +	struct page *pml_pg;
>> @@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct v=
cpu_vmx *vmx)
>>   	   a current VMCS12
>>   	*/
>>   	exec_control &=3D ~SECONDARY_EXEC_SHADOW_VMCS;
>> +	/* PML is enabled/disabled in creating/destorying vcpu */
>> +	exec_control &=3D ~SECONDARY_EXEC_ENABLE_PML;
> What is the harm of enabling it here?
>
> (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
Because the PML feature detection is unconditional (meaning=20
SECONDARY_EXEC_ENABLE_PML is always in=20
vmcs_config.cpu_based_2nd_exec_ctrl), but the PML buffer is only create=
d=20
when vcpu is created, and it is controlled by 'enable_pml' module=20
parameter, if we always enable SECONDARY_EXEC_ENABLE_PML here, no PML=20
buffer will be created if PML is disabled by 'enable_pml' parameter, so=
=20
it's better to enable it along with creating PML buffer.

>
>> +
>>   	return exec_control;
>>   }
>> @@ -6971,6 +7001,31 @@ static bool vmx_test_pir(struct kvm_vcpu *vcp=
u, int vector)
>> +static int handle_pml_full(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long exit_qualification;
>> +
>> +	trace_kvm_pml_full(vcpu->vcpu_id);
>> +
>> +	exit_qualification =3D vmcs_readl(EXIT_QUALIFICATION);
>> +
>> +	/*
>> +	 * PML buffer FULL happened while executing iret from NMI,
>> +	 * "blocked by NMI" bit has to be set before next VM entry.
>> +	 */
>> +	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK=
) &&
>> +			cpu_has_virtual_nmis() &&
>> +			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
>> +		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
>> +				GUEST_INTR_STATE_NMI);
> Relevant part of the specification is pasted from 27.2.2 (Information
> for VM Exits Due to Vectored Events), which makes me think that this =
bit
> is mirrored to IDT-vectoring information field ...
>
> Isn't vmx_recover_nmi_blocking() enough?
>
> (I see the same code in handle_ept_violation(), but wasn't that neede=
d
>   just because of a hardware error?)
This needs to be handled in both EPT violation and PML. If you look at=20
the PML specification (the link is in cover letter), you can see the=20
definition of bit 12 of exit_qualification (section 1.5), which explain=
s=20
above code. The same definition of exit_qualification is in EPT=20
violation part in Intel's SDM.

>> +
>> +	/*
>> +	 * PML buffer already flushed at beginning of VMEXIT. Nothing to d=
o
>> +	 * here.., and there's no userspace involvement needed for PML.
>> +	 */
>> +	return 1;
>> +}
>> @@ -7325,6 +7381,89 @@ static void vmx_get_exit_info(struct kvm_vcpu=
 *vcpu, u64 *info1, u64 *info2)
>> +static int vmx_enable_pml(struct vcpu_vmx *vmx)
>> +{
>> +	struct page *pml_pg;
>> +	u32 exec_control;
>> +
>> +	pml_pg =3D alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!pml_pg)
>> +		return -ENOMEM;
>> +
>> +	vmx->pml_pg =3D pml_pg;
> (It's safe to use vmx->pml_pg directly.)
>
>> +
>> +	vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
>> +	vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +
>> +	exec_control =3D vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +	exec_control |=3D SECONDARY_EXEC_ENABLE_PML;
>> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
>> +
>> +	return 0;
>> +}
>> +static void vmx_flush_pml_buffer(struct vcpu_vmx *vmx)
>> +{
>> +	struct kvm *kvm =3D vmx->vcpu.kvm;
>> +	u64 *pml_buf;
>> +	u16 pml_idx;
>> +
>> +	pml_idx =3D vmcs_read16(GUEST_PML_INDEX);
>> +
>> +	/* Do nothing if PML buffer is empty */
>> +	if (pml_idx =3D=3D (PML_ENTITY_NUM - 1))
>> +		return;
>> +
>> +	/* PML index always points to next available PML buffer entity */
>> +	if (pml_idx >=3D PML_ENTITY_NUM)
>> +		pml_idx =3D 0;
>> +	else
>> +		pml_idx++;
> If the pml_idx is >=3D PML_ENTITY_NUM and < 0xffff, the log is empty,
In this case, the log is full, actually. PML index is set to=20
PML_ENTITY_NUM - 1 initially, and hardware decrease it after GPA is log=
ged.

Below is the pseudocode of PML hardware logic.

IF (PML Index[15:9] =E2=89=A0 0)
     THEN VM exit;
=46I;
set accessed and dirty flags for EPT;
IF (a dirty flag was updated from 0 to 1)
THEN
     PML address[PML index] =E2=86=90 4-KByte-aligned guest-physical ad=
dress;
     PML index is decremented;
=46I;

Thanks,
-Kai
> so it would be better to use just 'pml_idx++' and unsigned modulo.
>
> (Could also 'assert(pml_idx < PML_ENTITY_NUM)' then.)
>
>> +
>> +	pml_buf =3D page_address(vmx->pml_pg);
>> +	for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
>> +		u64 gpa;
>> +
>> +		gpa =3D pml_buf[pml_idx];
>> +		WARN_ON(gpa & (PAGE_SIZE - 1));
>> +		mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
>> +	}
>> +
>> +	/* reset PML index */
>> +	vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +}
>> @@ -9492,6 +9655,31 @@ static void vmx_sched_in(struct kvm_vcpu *vcp=
u, int cpu)
>> +static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>> +				     struct kvm_memory_slot *slot)
>> +{
>> +	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> (New slot contains dirty pages?)
>
>> +	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
>> +}
> Thanks.