From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Subject: Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run Date: Tue, 12 May 2009 15:33:23 +0200 Message-ID: <4A097AA3.8000007@linux.vnet.ibm.com> References: <1241534358-32172-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1241534358-32172-2-git-send-email-ehrhardt@linux.vnet.ibm.com> <4A017C2E.6060306@redhat.com> <4A08217E.6000102@linux.vnet.ibm.com> <4A0824EF.5010201@redhat.com> <4A082C4E.60501@linux.vnet.ibm.com> <4A082FF3.4060908@redhat.com> <4A083956.2000200@linux.vnet.ibm.com> <4A083DCC.8000805@redhat.com> <4A093E1B.9030008@linux.vnet.ibm.com> <4A095EEA.60902@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm , Christian Borntraeger , Carsten Otte To: Avi Kivity Return-path: Received: from mtagate2.uk.ibm.com ([194.196.100.162]:48407 "EHLO mtagate2.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751512AbZELNd1 (ORCPT ); Tue, 12 May 2009 09:33:27 -0400 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate2.uk.ibm.com (8.13.1/8.13.1) with ESMTP id n4CDXRl5019213 for ; Tue, 12 May 2009 13:33:27 GMT Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n4CDXRa61871942 for ; Tue, 12 May 2009 14:33:27 +0100 Received: from d06av03.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n4CDXRf9006294 for ; Tue, 12 May 2009 14:33:27 +0100 In-Reply-To: <4A095EEA.60902@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > Christian Ehrhardt wrote: >> Avi Kivity wrote: >>> Christian Ehrhardt wrote: >>>> >>>> The bad thing on vcpu->request in that case is that I don't want=20 >>>> the async behaviour of vcpu->requests in that case, I want the=20 >>>> memory slot updated in all vcpu's when the ioctl is returning. >>> >>> You mean, the hardware can access the vcpu control block even when=20 >>> the vcpu is not running?=20 >> No, hardware only uses it with a running vcpu, but I realised my own= =20 >> fault while changing the code to vcpu->request style. >> For s390 I need to update the KVM->arch and *all*=20 >> vcpu->arch->sie_block... data synchronously. > > Out of interest, can you explain why? Sure I'll try to give an example. a) The whole guest has "one" memory slot representing all it's memory.=20 Therefore some important values like guest_origin and guest_memsize (on= e=20 slot so it's just addr+size) are kept at VM level in kvm->arch. b) We fortunately have cool hardware support for "nearly everything"(tm= )=20 :-) In this case for example we set in vcpu->arch.sie_block the values=20 for origin and size translated into a "limit" to get memory management=20 virtualization support. c) we have other code e.g. all our copy_from/to_guest stuff that uses=20 the kvm->arch values If we would allow e.g. updates of a memslot (or as the patch supposes t= o=20 harden the set_memory_region code against inconsiderate code changes in= =20 other sections) it might happen that we set the kvm->arch information=20 but the vcpu->arch->sie_block stuff not until next reentry. Now=20 concurrently the running vcpu could cause some kind of fault that=20 involves a copy_from/to_guest. That way we could end up with potentiall= y=20 invalid handling of that fault (fault handling and running guest would=20 use different userspace adresses until it is synced on next vcpu=20 reentry) - it's theoretical I know, but it might cause some issues that= =20 would be hard to find. On the other hand for the long term I wanted to note that all our=20 copy_from/to_guest functions is per vcpu, so when we some day implement= =20 updateable memslots, multiple memslots or even just fill "free time"(tm= )=20 and streamline our code we could redesign that origin/size storage. Thi= s=20 could be done multiple ways, either just store it per vcpu or with a=20 lock for the kvm->arch level variables - both ways and maybe more could= =20 then use the vcpu->request based approach, but unfortunately it's=20 neither part of that patch nor of the current effort to do that. The really good thing is, because of our discussion about that I now=20 have a really detailed idea how I can improve that code aside from this= =20 bugfix patch (lets hope not too far in the future). >> That makes the "per vcpu resync on next entry" approach not feasible= =2E >> >> On the other hand I realized at the same moment that the livelock=20 >> should be no issue for us, because as I mentioned: >> a) only one memslot >> b) a vcpu can't run without memslot >> So I don't even need to kick out vcpu's, they just should not be=20 >> running. >> Until we ever support multiple slots, or updates of the existing=20 >> single slot this should be ok, so is the bugfix patch this should be= =2E >> To avoid a theoretical deadlock in case other code is changing=20 >> (badly) it should be fair to aquire the lock with mutex_trylock and=20 >> return -EINVAL if we did not get all locks. > > OK. > > --=20 Gr=C3=BCsse / regards,=20 Christian Ehrhardt IBM Linux Technology Center, Open Virtualization