From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Subject: Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state Date: Thu, 28 May 2009 09:59:36 +0200 Message-ID: <4A1E4468.1050309@linux.vnet.ibm.com> References: <1243251652-27617-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1243251652-27617-2-git-send-email-ehrhardt@linux.vnet.ibm.com> <20090525202248.GA7608@amt.cnet> <4A1BA233.6080504@linux.vnet.ibm.com> <20090528034412.GA6090@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, avi@redhat.com, borntraeger@de.ibm.com, cotte@de.ibm.com, heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com To: Marcelo Tosatti Return-path: Received: from mtagate4.de.ibm.com ([195.212.29.153]:55626 "EHLO mtagate4.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760223AbZE1H7k (ORCPT ); Thu, 28 May 2009 03:59:40 -0400 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate4.de.ibm.com (8.14.3/8.13.8) with ESMTP id n4S7xfVx074572 for ; Thu, 28 May 2009 07:59:41 GMT Received: from d12av01.megacenter.de.ibm.com (d12av01.megacenter.de.ibm.com [9.149.165.212]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n4S7xfuj2838564 for ; Thu, 28 May 2009 09:59:41 +0200 Received: from d12av01.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av01.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n4S7xfZG015815 for ; Thu, 28 May 2009 09:59:41 +0200 In-Reply-To: <20090528034412.GA6090@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > On Tue, May 26, 2009 at 10:02:59AM +0200, Christian Ehrhardt wrote: > =20 >> Marcelo Tosatti wrote: >> =20 >>> On Mon, May 25, 2009 at 01:40:49PM +0200, ehrhardt@linux.vnet.ibm.c= om wrote: >>> =20 >>> =20 >>>> From: Christian Ehrhardt >>>> >>>> To ensure vcpu's come out of guest context in certain cases this p= atch adds a >>>> s390 specific way to kick them out of guest context. Currently it = kicks them >>>> out to rerun the vcpu_run path in the s390 code, but the mechanism= itself is >>>> expandable and with a new flag we could also add e.g. kicks to use= rspace etc. >>>> >>>> Signed-off-by: Christian Ehrhardt >>>> =20 >>>> =20 >>> "For now I added the optimization to skip kicking vcpus out of gues= t >>> that had the request bit already set to the s390 specific loop (sen= t as >>> v2 in a few minutes). >>> >>> We might one day consider standardizing some generic kickout levels= e.g. >>> kick to "inner loop", "arch vcpu run", "generic vcpu run", "userspa= ce", >>> ... whatever levels fit *all* our use cases. And then let that kick= s be >>> implemented in an kvm_arch_* backend as it might be very different = how >>> they behave on different architectures." >>> >>> That would be ideal, yes. Two things make_all_requests handles:=20 >>> >>> 1) It disables preemption with get_cpu(), so it can reliably check = for >>> cpu id. Somehow you don't need that for s390 when kicking multiple >>> vcpus? >>> =20 >>> =20 >> I don't even need the cpuid as make_all_requests does, I just insert= a =20 >> special bit in the vcpu arch part and the vcpu will "come out to me=20 >> (host)". >> Fortunateley the kick is rare and fast so I can just insert it =20 >> unconditionally (it's even ok to insert it if the vcpu is not in gue= st =20 >> state). That prevents us from needing vcpu lock or detailed checks w= hich =20 >> would end up where we started (no guarantee that vcpu's come out of = =20 >> guest context while trying to aquire all vcpu locks) >> =20 > > Let me see if I get this right: you kick the vcpus out of guest mode = by > using a special bit in the vcpu arch part. OK. > > What I don't understand is this:=20 > "would end up where we started (no guarantee that vcpu's come out of > guest context while trying to aquire all vcpu locks)" > =20 initially the mechanism looped over vcpu's and just aquired the vcpu=20 lock and then updated the vcpu.arch infor directly. Avi mentioned that we have no guarantee if/when the vcpu will come out=20 of guest context to free a lock currently held and suggested the=20 mechanism x86 uses via setting vcpu->request and kicking the vcpu. That= s=20 the eason behind "end up where we (the discussion) started", if we woul= d=20 need the vcpu lock again we would be at the beginnign of the discussion= =2E > So you _need_ a mechanism to kick all vcpus out of guest mode? > =20 I have a mechanism to kick a vcpu, and I use it. Due to the fact that=20 smp_call_* don't work as kick for us the kick is an arch specific funct= ion. I hop ethat clarified this part :-) >>> 2) It uses smp_call_function_many(wait=3D1), which guarantees that = by the >>> time make_all_requests returns no vcpus will be using stale data (t= he >>> remote vcpus will have executed ack_flush). >>> =20 >>> =20 >> yes this is really a part my s390 implementation doesn't fulfill yet= =2E =20 >> Currently on return vcpus might still use the old memslot informatio= n. >> As mentioned before letting all interrupts come "too far" out of the= hot =20 >> loop would be a performance issue, therefore I think I will need som= e =20 >> request&confirm mechanism. I'm not sure yet but maybe it could be as= =20 >> easy as this pseudo code example: >> >> # in make_all_requests >> # remember we have slots_lock write here and the reentry that update= s =20 >> the vcpu specific data aquires slots_lock for read. >> loop vcpus >> set_bit in vcpu requests >> kick vcpu #arch function >> endloop >> >> loop vcpus >> until the requested bit is disappeared #as the reentry path uses =20 >> test_and_clear it will disappear >> endloop >> >> That would be a implicit synchronization and should work, as I wrote= =20 >> before setting memslots while the guest is running is rare if ever =20 >> existant for s390. On x86 smp_call_many could then work without the = wait =20 >> flag being set. >> =20 > > I see, yes.=20 > > =20 >> But I assume that this synchronization approach is slower as it =20 >> serializes all vcpus on reentry (they wait for the slots_lock to get= =20 >> dropped), therefore I wanted to ask how often setting memslots on =20 >> runtime will occur on x86 ? Would this approach be acceptable ? >> =20 > > For x86 we need slots_lock for two things: > > 1) to protect the memslot structures from changing (very rare), ie: > kvm_set_memory. > > 2) to protect updates to the dirty bitmap (operations on behalf of > guest) which take slots_lock for read versus updates to that dirty > bitmap (an ioctl that retrieves what pages have been dirtied in the > memslots, and clears the dirtyness info). > > All you need for S390 is 1), AFAICS. > =20 correct > For 1), we can drop the slots_lock usage, but instead create an > explicit synchronization point, where all vcpus are forced to (say > kvm_vcpu_block) "paused" state. qemu-kvm has such notion. > > Same language? > =20 Yes, I think i got your point :-) But I think by keeping slots_lock we already got our synchronization=20 point and don't need an explicit one adding extra code and maybe locks. As I mentioned above it should synchronize already implicit. When I=20 looked at it once more yesterday I realized that kvm_set_memory is not=20 performance critical anyway (i.e. does not have to be the fastest ioctl= =20 on earth) so we could be one step smarter and instead of serializing al= l=20 vcpu's among each other we could set_memory just let do it one by one. In case I lost you again due to my obviously confusing mainframe=20 language this week you might want to my next patch submission where I=20 implement that in the s390 arch code as an example. I'll put you on cc=20 and in that new code we might find an implicit language synchronization= =20 for us :-) [...] --=20 Gr=FCsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization=20