From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [v6] kvm/fpu: Enable fully eager restore kvm FPU Date: Thu, 23 Apr 2015 13:25:02 +0200 Message-ID: <5538D68E.4010702@siemens.com> References: <1429823583-3226-1-git-send-email-liang.z.li@intel.com> <5538CC15.4010005@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: gleb@kernel.org, Marcelo Tosatti , tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, joro@8bytes.org, yang.z.zhang@intel.com, Xudong Hao To: Paolo Bonzini , Liang Li , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Return-path: In-Reply-To: <5538CC15.4010005@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 2015-04-23 12:40, Paolo Bonzini wrote: > > > On 23/04/2015 23:13, Liang Li wrote: >> Romove lazy FPU logic and use eager FPU entirely. Eager FPU does >> not have performance regression, and it can simplify the code. >> >> When compiling kernel on westmere, the performance of eager FPU >> is about 0.4% faster than lazy FPU. >> >> Signed-off-by: Liang Li >> Signed-off-by: Xudong Hao > > A patch like this requires much more benchmarking than what you have done. > > First, what guest did you use? A modern Linux guest will hardly ever exit > to userspace: the scheduler uses the TSC deadline timer, which is handled > in the kernel; the clocksource uses the TSC; virtio-blk devices are kicked > via ioeventfd. > > What happens if you time a Windows guest (without any Hyper-V enlightenments), > or if you use clocksource=acpi_pm? > > Second, "0.4%" by itself may not be statistically significant. How did > you gather the result? How many times did you run the benchmark? Did > the guest report any stolen time? > > > And finally, even if the patch was indeed a performance improvement, > there is much more that you can remove. fpu_active is always 1, > vmx_fpu_activate only has one call site that can be simplified just to > > vcpu->arch.cr0_guest_owned_bits = X86_CR0_TS; > vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits); > > and so on. And it would be good to know how the benchmarks look like on other CPUs than the chosen Intel model. Including older ones. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux