kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
Date: Mon, 31 Aug 2015 21:23:13 +0800	[thread overview]
Message-ID: <55E45541.5010206@linux.intel.com> (raw)
In-Reply-To: <20150831141932-mutt-send-email-mst@redhat.com>



On 08/31/2015 07:27 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 31, 2015 at 04:32:52PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 08/31/2015 03:46 PM, Michael S. Tsirkin wrote:
>>> On Mon, Aug 31, 2015 at 10:53:58AM +0800, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
>>>>> Even when we skip data decoding, MMIO is slightly slower
>>>>> than port IO because it uses the page-tables, so the CPU
>>>>> must do a pagewalk on each access.
>>>>>
>>>>> This overhead is normally masked by using the TLB cache:
>>>>> but not so for KVM MMIO, where PTEs are marked as reserved
>>>>> and so are never cached.
>>>>>
>>>>> As ioeventfd memory is never read, make it possible to use
>>>>> RO pages on the host for ioeventfds, instead.
>>>>
>>>> I like this idea.
>>>>
>>>>> The result is that TLBs are cached, which finally makes MMIO
>>>>> as fast as port IO.
>>>>
>>>> What does "TLBs are cached" mean? Even after applying the patch
>>>> no new TLB type can be cached.
>>>
>>> The Intel manual says:
>>> 	No guest-physical mappings or combined mappings are created with
>>> 	information derived from EPT paging-structure entries that are not present
>>> 	(bits 2:0 are all 0) or that are misconfigured (see Section 28.2.3.1).
>>>
>>> 	No combined mappings are created with information derived from guest
>>> 	paging-structure entries that are not present or that set reserved bits.
>>>
>>> Thus mappings that result in EPT violation are created, this makes
>>> EPT violation preferable to EPT misconfiguration.
>>
>> Hmm... but your logic completely bypasses page-table-installation, the page
>> table entry is nonpresent forever for eventfd memory.
>
> As far as I can tell, not really: a non present page is not an EPT violation,

Actually, no.

There are two EPT VM-exit, one is EPT violation which is caused if EPT entry is
not present or the access permission is not enough. Another one is EPT misconfig
which is caused if EPT reserved bits is set.

> so at the first write, the regular logic will trigger and install the PTE,
> then instruction is re-executed and trigger an EPT violation.
>
>
>>>
>>>
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>>   arch/x86/kvm/vmx.c | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 9d1bfd3..ed44026 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>>>>   		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>>>>>
>>>>>   	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>>>> +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>>>>> +		skip_emulated_instruction(vcpu);
>>>>> +		return 1;
>>>>> +	}
>>>>> +
>>>>
>>>> I am afraid that the common page fault entry point is not a good place to do the
>>>> work.
>>>
>>> Why isn't it?
>>
>> 1) You always do bus_write even if it is a read access. You can not assume that the
>>     memory region can't be read by guest.
>>
>> 2) The workload of _bus_write is added for all kinds of page fault, normal #PF is fair
>>     frequent than #PF happens on RO memory.
>
> Normal PF is slow path: you need to hit disk to pull memory from swap,
> etc etc. More importantly, it installs the PTE so the next access
> does not cause an exit at all.
>
> At some level that is the whole point of the patch: we are adding a fast
> path option to what would normally be slow path only, so we aren't
> slowing down anything important.

I have another question, the eventfd memory is never read by guest and it's always
a write MMIO VM-exit, why you build it on RO memslot? Why not just use normal MMIO page
instead?

>
>> 3) It completely bypasses the logic of handing RO memslot.
>>
>>>
>>>> Would move it to kvm_handle_bad_page()? The different is the workload of
>>>> fast_page_fault() is included but it's light enough and MMIO-exit should not be
>>>> very frequent, so i think it's okay.
>>>
>>> That was supposed to be a slow path, I doubt it'll work well without
>>> major code restructuring.
>>> IIUC by design everything that's not going through fast_page_fault
>>> is supposed to be slow path that only happens rarely.
>>>
>>
>> Do you have performance numbers which compare this patch and the way i figured out?
>
> Not yet.
>
>>> But in this case, the page stays read-only, we need a new fast path
>>> through the code.
>>>
>>
>> Another solution is making MMU recognise the RO region which is write-mostly, then
>> make the page table entry be reserved other than readonly.
>
> Reserved results in EPT misconfiguration, so it's not cached.
>

See my comments above.

More addition: even if the EPT entry is cache, it is only a readonly permission in TLB
entry, this is useless to speed up write access.

  reply	other threads:[~2015-08-31 13:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-30  9:12 [PATCH RFC 0/3] kvm add ioeventfd pf capability Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
2015-08-31  2:53   ` Xiao Guangrong
2015-08-31  7:46     ` Michael S. Tsirkin
2015-08-31  8:32       ` Xiao Guangrong
2015-08-31 11:27         ` Michael S. Tsirkin
2015-08-31 13:23           ` Xiao Guangrong [this message]
2015-08-31 14:57             ` Michael S. Tsirkin
2015-09-01  3:37   ` Jason Wang
2015-09-01  4:36     ` Michael S. Tsirkin
2015-09-01  4:49       ` Jason Wang
2015-09-01  6:55         ` Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 2/3] svm: allow ioeventfd for NPT page faults Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 3/3] kvm: add KVM_CAP_IOEVENTFD_PF capability Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55E45541.5010206@linux.intel.com \
    --to=guangrong.xiao@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).