From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 4/5] KVM: Lazify fpu activation and deactivation Date: Thu, 14 Jan 2010 15:11:53 +0200 Message-ID: <4B4F1819.7040105@redhat.com> References: <1262190342-18611-1-git-send-email-avi@redhat.com> <1262190342-18611-5-git-send-email-avi@redhat.com> <4B4F17F0.7010501@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Lucas Meneghel Rodrigues To: Marcelo Tosatti , Sheng Yang , Joerg Roedel Return-path: Received: from mx1.redhat.com ([209.132.183.28]:48426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756396Ab0ANNL5 (ORCPT ); Thu, 14 Jan 2010 08:11:57 -0500 In-Reply-To: <4B4F17F0.7010501@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: (actually copying Lucas). On 01/14/2010 03:11 PM, Avi Kivity wrote: > On 12/30/2009 06:25 PM, Avi Kivity wrote: >> Defer fpu deactivation as much as possible - if the guest fpu is >> loaded, keep >> it loaded until the next heavyweight exit (where we are forced to >> unload it). >> This reduces unnecessary exits. >> >> We also defer fpu activation on clts; while clts signals the intent >> to use the >> fpu, we can't be sure the guest will actually use it. >> > > ... > > >> @@ -4988,6 +4988,10 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) >> return; >> >> vcpu->guest_fpu_loaded = 0; >> + if (vcpu->fpu_active) { >> + vcpu->fpu_active = 0; >> + kvm_x86_ops->fpu_deactivate(vcpu); >> + } >> kvm_fx_save(&vcpu->arch.guest_fx_image); >> kvm_fx_restore(&vcpu->arch.host_fx_image); >> ++vcpu->stat.fpu_reload; > > This is broken badly; kvm_put_guest_fpu() can be called from preempt > notifier context, that is during normal execution of vcpu processing. > Code which modifies the same variables as ->fpu_deactivate() or that > depends on ->fpu_active will break. > > I fixed this by calling ->fpu_deactivate() from a synchronous context > using vcpu->requests, like we do everywhere else. > > Strangely, autotest only caught this on AMD and even it took a while. > Lucas, can you integrate something like the following into autotest, > so we exercise the preemption code harder? > > #!/usr/bin/python > > import sys, os, re, random, ctypes, time > > tasks = sys.argv[1:] > > threads = [int(t) > for k in tasks > for t in os.listdir('/proc/%s/task' % (k,))] > > cpus = [int(c[3:]) > for c in os.listdir('/sys/devices/system/cpu') > if re.match(r'cpu[0-9]+', c)] > > rand = random.Random() > > sched_setaffinity = ctypes.CDLL('libc.so.6').sched_setaffinity > > while True: > pid = rand.choice(threads) > cpu = rand.choice(cpus) > mask = 1 << cpu > sched_setaffinity(ctypes.c_int(pid), ctypes.c_size_t(4), > ctypes.byref(ctypes.c_int(mask))) > try: > time.sleep(0.01) > except: > break > > -- error compiling committee.c: too many arguments to function