From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55522) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeV0U-0007C7-LA for qemu-devel@nongnu.org; Thu, 07 Nov 2013 14:12:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VeV0M-0001G4-6u for qemu-devel@nongnu.org; Thu, 07 Nov 2013 14:12:42 -0500 Received: from mail-ee0-x22b.google.com ([2a00:1450:4013:c00::22b]:63286) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeV0L-0001Fv-Vv for qemu-devel@nongnu.org; Thu, 07 Nov 2013 14:12:34 -0500 Received: by mail-ee0-f43.google.com with SMTP id b47so538364eek.30 for ; Thu, 07 Nov 2013 11:12:32 -0800 (PST) Sender: Paolo Bonzini Message-ID: <527BE61B.3010309@redhat.com> Date: Thu, 07 Nov 2013 20:12:27 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1383840877-2861-1-git-send-email-pbonzini@redhat.com> <20131107162131.GA4370@redhat.com> <527BBFDB.2010404@redhat.com> <20131107164705.GA4572@redhat.com> <527BCE04.9020107@redhat.com> <20131107185413.GA4974@redhat.com> In-Reply-To: <20131107185413.GA4974@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: lcapitulino@redhat.com, qemu-devel@nongnu.org, marcel.a@redhat.com Il 07/11/2013 19:54, Michael S. Tsirkin ha scritto: > On Thu, Nov 07, 2013 at 06:29:40PM +0100, Paolo Bonzini wrote: >> Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto: >>> That's on kvm with 52 bit address. >>> But where I would be concerned is systems with e.g. 36 bit address >>> space where we are doubling the cost of the lookup. >>> E.g. try i386 and not x86_64. >> >> Tried now... >> >> P_L2_LEVELS pre-patch post-patch >> i386 3 6 >> x86_64 4 6 >> >> I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG. With >> TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64 >> (you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32 targets). >> These can be more or less entirely ascribed to phys_page_find: >> >> TCG | KVM >> pre-patch post-patch | pre-patch post-patch >> phys_page_find(i386) 13% 25% | 0.6% 1% >> inl_from_qemu cycles(i386) 153 173 | ~12000 ~12000 > > I'm a bit confused by the numbers above. The % of phys_page_find has > grown from 13% to 25% (almost double, which is kind of expected > give we have twice the # of levels). Yes. > But overhead in # of cycles only went from 153 to > 173? new cycles / old cycles = 173 / 153 = 113% % outside phys_page_find + % in phys_page_find*2 = 87% + 13%*2 = 113% > Maybe the test is a bit wrong for tcg - how about unrolling the > loop in kvm unit test? Done that already. :) >> Also, compiling with "-fstack-protector" instead of "-fstack-protector-all", >> as suggested a while ago by rth, is already giving a savings of 20 cycles. > > Is it true that with TCG this affects more than just MMIO > as phys_page_find will also sometimes run on CPU accesses to memory? Yes. I tried benchmarking with perf the boot of a RHEL guest, which has TCG | KVM pre-patch post-patch | pre-patch post-patch 3% 5.8% | 0.9% 1.7% This is actually higher than usual for KVM because there are many VGA access during GRUB. >> And of course, if this were a realistic test, KVM's 60x penalty would >> be a severe problem---but it isn't, because this is not a realistic setting. > > Well, for this argument to carry the day we'd need to design > a realistic test which isn't easy :) Yes, I guess the number that matters is the extra 2% penalty for TCG (the part that doesn't come from MMIO). Paolo