From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Subject: Re: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased Date: Mon, 08 Jun 2009 12:51:26 +0200 Message-ID: <4A2CED2E.6030904@linux.vnet.ibm.com> References: <1243952771-32428-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1243952771-32428-4-git-send-email-ehrhardt@linux.vnet.ibm.com> <20090605205312.GA13471@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, avi@redhat.com, borntraeger@de.ibm.com, cotte@de.ibm.com, heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com To: Marcelo Tosatti Return-path: Received: from mtagate2.de.ibm.com ([195.212.17.162]:54012 "EHLO mtagate2.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752663AbZFHKv3 (ORCPT ); Mon, 8 Jun 2009 06:51:29 -0400 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate2.de.ibm.com (8.13.1/8.13.1) with ESMTP id n58ApVJx002462 for ; Mon, 8 Jun 2009 10:51:31 GMT Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n58ApVTb2838718 for ; Mon, 8 Jun 2009 12:51:31 +0200 Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n58ApUlc002343 for ; Mon, 8 Jun 2009 12:51:31 +0200 In-Reply-To: <20090605205312.GA13471@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > On Tue, Jun 02, 2009 at 04:26:11PM +0200, ehrhardt@linux.vnet.ibm.com= wrote: > =20 >> From: Christian Ehrhardt >> =20 [...] >> @@ -706,13 +713,19 @@ int kvm_arch_set_memory_region(struct kv >> =20 >> /* request update of sie control block for all available vcpus */ >> for (i =3D 0; i < KVM_MAX_VCPUS; ++i) { >> - if (kvm->vcpus[i]) { >> - if (test_and_set_bit(KVM_REQ_MMU_RELOAD, >> - &kvm->vcpus[i]->requests)) >> - continue; >> - kvm_s390_inject_sigp_stop(kvm->vcpus[i], >> - ACTION_VCPUREQUEST_ON_STOP); >> - } >> + vcpu =3D kvm->vcpus[i]; >> + if (!vcpu) >> + continue; >> + >> + if (!test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) >> + continue; >> + >> + if (vcpu->cpu =3D=3D -1) >> + continue; >> =20 > > What happens if the check for cpu =3D=3D -1 races with kvm_arch_vcpu_= put? > This context will wait until the vcpu_put context is scheduled back i= n > to clear the bit? Is that OK? > =20 It either comes back to clear the bit or it is consumed on deletion of=20 the vcpu. Both ways are ok. The question we have to answer is if it=20 might stall the mem update ioctl for too long. Because eventually the check for vcpu->cpu =3D=3D -1 is just an optimiz= ation=20 if we would completely ignore remove it we would have the same problem=20 -> could it stall the set mem operation too much. That means the "race"= =20 is not an issue it might just be sub-optimal, but the chance for a long= =20 stall could become an issue. Unfortunately I have no better approach to= =20 that (yet), until then this I like this implementation more than what w= e=20 would have without all the corner case fixes in that patch series. >> + >> + kvm_s390_inject_sigp_stop(vcpu, ACTION_VCPUREQUEST_ON_STOP); >> + wait_on_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD, >> + wait_bit_schedule, TASK_UNINTERRUPTIBLE); >> } >> =20 > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > { > + vcpu->cpu =3D -1; > save_fp_regs(&vcpu->arch.guest_fpregs); > =20 [...] >> +++ kvm/arch/s390/kvm/kvm-s390.h >> @@ -92,6 +92,13 @@ static inline unsigned long kvm_s390_han >> if (!vcpu->requests) >> return 0; >> =20 >> + /* requests that can be handled at all levels */ >> + if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) { >> + smp_mb__after_clear_bit(); >> =20 > > Really need that smp_mb__after_clear_bit ? AFAIK test_and_clear_bit > implies a barrier? > =20 Well I agree that practically test_and_clear_bit has a barrier on s390,= =20 but as far as I read Documentation/atomic_ops.txt at line 339-360 I=20 think the interface does not imply it so I wanted to add it explicitly.= =20 I would be happy if someone really knows the in depth details here and=20 corrects me :-) >> + wake_up_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD); >> + kvm_s390_vcpu_set_mem(vcpu); >> + } >> + >> return vcpu->requests; >> } >> =20 >> Index: kvm/virt/kvm/kvm_main.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- kvm.orig/virt/kvm/kvm_main.c >> +++ kvm/virt/kvm/kvm_main.c >> @@ -1682,6 +1682,10 @@ static int kvm_vcpu_release(struct inode >> { >> struct kvm_vcpu *vcpu =3D filp->private_data; >> =20 >> + clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests); >> + smp_mb__after_clear_bit(); >> + wake_up_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD); >> + >> =20 > > And this should be generic? Say if other architectures want to make u= se=20 > of a similar wait infrastructure. Talk is cheap. > =20 Clear bit and wake up on release doesn't hurt any architecture, but it=20 is at a good place fine for those using the mechanism to ensure cleanin= g=20 up outstanding things when closing a vcpu fd. I thought its not worth to add kvm_ARCH_vcpu_release for it while I=20 could do so if we want it separated. (continued below) > Anyway, yeah, the set request / wait mechanism you implement here is > quite similar to the idea mentioned earlier that could be used for x8= 6. > > Just get rid of this explicit KVM_REQ_MMU_RELOAD knowledge in > arch-independent code please (if you want to see this merged). > =20 I agree to lift the wait part to other archs later if needed, but as=20 mentioned above I could move this to arch code to the cost of one arch=20 hook more. But as also mentioned it doesn't really hurt. I agree that i= t=20 does not need to be KVM_REQ_MMU_RELOAD specific, we could just=20 walk/clear/wake all bits on that vcpu->requests variable. Would that be generic enough in your opinion ? > Later it can all be lifted off to arch independent code. > =20 True for the wait part which can evolve in our arch code until it is=20 ripe to get cross arch merged. --=20 Gr=FCsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization=20