From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] KVM: APIC: avoid instruction emulation for EOI writes Date: Mon, 29 Aug 2011 17:14:24 +0300 Message-ID: <4E5B9EC0.2010808@redhat.com> References: <625BA99ED14B2D499DC4E29D8138F15063045B0C0C@shsmsx502.ccr.corp.intel.com> <4E5B68DA.1090208@siemens.com> <4E5B70F8.30307@redhat.com> <4E5B7206.5070603@siemens.com> <4E5B73CC.5080800@redhat.com> <4E5B9A3B.2020009@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Tian, Kevin" , "kvm@vger.kernel.org" , "Nakajima, Jun" , "Dong, Eddie" , Marcelo Tosatti To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:2234 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753237Ab1H2OOb (ORCPT ); Mon, 29 Aug 2011 10:14:31 -0400 In-Reply-To: <4E5B9A3B.2020009@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 08/29/2011 04:55 PM, Jan Kiszka wrote: > On 2011-08-29 13:11, Avi Kivity wrote: > > On 08/29/2011 02:03 PM, Jan Kiszka wrote: > >>> > >>> Just reading the first byte requires a guest page table walk. This is > >>> probably the highest cost in emulation (which also requires a walk for > >>> the data access). > >> > >> And what about caching the result of the first walk? Usually, a "sane > >> guest" won't have many code pages that issue the EIO. > >> > > > > There's no way to know when to invalidate the cache. > > Set the affected code page read-only? The virt-phys mapping could change too. And please, don't think of new reasons to write protect pages, they break up my lovely 2M maps. > > > > We could go a bit further, and cache the the whole thing. On the first > > exit, do the entire emulation, and remember %rip. On the second exit, > > if %rip matches, skip directly to kvm_lapic_eoi(). > > > > But I don't think it's worth it. This also has failure modes, and > > really, no guest will ever write to EOI with stosl. > > ...or add/sub/and/or etc. Argh, yes, flags can be updated. Actually, this might work - if we get a read access first as part of the RMW, we'll emulate the instruction. No idea what the hardware does in this case. > Well, we've done other crazy things in the > past just to keep even the unlikely case correct. I was just wondering > if that policy changed. I can't answer yes to that question. But I see no way to make it work both fast and correct. > > However, I just realized that user space is able to avoid this > inaccuracy for potentially insane guests by not using in-kernel > irqchips. So we have at least a knob. Could/should have a flag to disable this in the kernel as well. -- error compiling committee.c: too many arguments to function