From mboxrd@z Thu Jan  1 00:00:00 1970
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of	guest
 state
Date: Tue, 26 May 2009 10:02:59 +0200
Message-ID: <4A1BA233.6080504@linux.vnet.ibm.com>
References: <1243251652-27617-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1243251652-27617-2-git-send-email-ehrhardt@linux.vnet.ibm.com> <20090525202248.GA7608@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: kvm@vger.kernel.org, avi@redhat.com, borntraeger@de.ibm.com,
	cotte@de.ibm.com, heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mtagate1.uk.ibm.com ([194.196.100.161]:50669 "EHLO
	mtagate1.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752105AbZEZIDG (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 May 2009 04:03:06 -0400
Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185])
	by mtagate1.uk.ibm.com (8.13.1/8.13.1) with ESMTP id n4Q8359i026327
	for <kvm@vger.kernel.org>; Tue, 26 May 2009 08:03:05 GMT
Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212])
	by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n4Q835po1048686
	for <kvm@vger.kernel.org>; Tue, 26 May 2009 09:03:05 +0100
Received: from d06av01.portsmouth.uk.ibm.com (loopback [127.0.0.1])
	by d06av01.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n4Q834O2016459
	for <kvm@vger.kernel.org>; Tue, 26 May 2009 09:03:05 +0100
In-Reply-To: <20090525202248.GA7608@amt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> On Mon, May 25, 2009 at 01:40:49PM +0200, ehrhardt@linux.vnet.ibm.com=
 wrote:
>  =20
>> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
>>
>> To ensure vcpu's come out of guest context in certain cases this pat=
ch adds a
>> s390 specific way to kick them out of guest context. Currently it ki=
cks them
>> out to rerun the vcpu_run path in the s390 code, but the mechanism i=
tself is
>> expandable and with a new flag we could also add e.g. kicks to users=
pace etc.
>>
>> Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
>>    =20
>
> "For now I added the optimization to skip kicking vcpus out of guest
> that had the request bit already set to the s390 specific loop (sent =
as
> v2 in a few minutes).
>
> We might one day consider standardizing some generic kickout levels e=
=2Eg.
> kick to "inner loop", "arch vcpu run", "generic vcpu run", "userspace=
",
> ... whatever levels fit *all* our use cases. And then let that kicks =
be
> implemented in an kvm_arch_* backend as it might be very different ho=
w
> they behave on different architectures."
>
> That would be ideal, yes. Two things make_all_requests handles:=20
>
> 1) It disables preemption with get_cpu(), so it can reliably check fo=
r
> cpu id. Somehow you don't need that for s390 when kicking multiple
> vcpus?
>  =20
I don't even need the cpuid as make_all_requests does, I just insert a=20
special bit in the vcpu arch part and the vcpu will "come out to me (ho=
st)".
=46ortunateley the kick is rare and fast so I can just insert it=20
unconditionally (it's even ok to insert it if the vcpu is not in guest=20
state). That prevents us from needing vcpu lock or detailed checks whic=
h=20
would end up where we started (no guarantee that vcpu's come out of=20
guest context while trying to aquire all vcpu locks)

> 2) It uses smp_call_function_many(wait=3D1), which guarantees that by=
 the
> time make_all_requests returns no vcpus will be using stale data (the
> remote vcpus will have executed ack_flush).
>  =20
yes this is really a part my s390 implementation doesn't fulfill yet.=20
Currently on return vcpus might still use the old memslot information.
As mentioned before letting all interrupts come "too far" out of the ho=
t=20
loop would be a performance issue, therefore I think I will need some=20
request&confirm mechanism. I'm not sure yet but maybe it could be as=20
easy as this pseudo code example:

# in make_all_requests
# remember we have slots_lock write here and the reentry that updates=20
the vcpu specific data aquires slots_lock for read.
loop vcpus
  set_bit in vcpu requests
  kick vcpu #arch function
endloop

loop vcpus
  until the requested bit is disappeared #as the reentry path uses=20
test_and_clear it will disappear
endloop

That would be a implicit synchronization and should work, as I wrote=20
before setting memslots while the guest is running is rare if ever=20
existant for s390. On x86 smp_call_many could then work without the wai=
t=20
flag being set.
But I assume that this synchronization approach is slower as it=20
serializes all vcpus on reentry (they wait for the slots_lock to get=20
dropped), therefore I wanted to ask how often setting memslots on=20
runtime will occur on x86 ? Would this approach be acceptable ?

If it is too adventurous for now I can implement it that way in the s39=
0=20
code and we split the long term discussion (synchronization + generic=20
kickout levels + who knows what comes up).
> If smp_call_function_many is hidden behind kvm_arch_kick_vcpus, can y=
ou
> make use of make_all_requests for S390 (without the smp_call_function=
=20
> performance impact you mentioned) ?
>  =20
In combination with the request&confirm mechanism desribed above it=20
should work if smp_call function and all the cpuid gathering which=20
belongs to it is hidden behind kvm_arch_kick_vcpus.
> For x86 we can further optimize make_all_requests by checking REQ_KIC=
K,
> and kvm_arch_kick_vcpus would be a good place for that.
>
> And the kickout levels idea you mentioned can come later, as an
> optimization?
yes I agree splitting that to a later optimization is a good idea.

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  =20


--=20

Gr=FCsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization=20