From mboxrd@z Thu Jan  1 00:00:00 1970
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
Date: Tue, 12 May 2009 15:33:23 +0200
Message-ID: <4A097AA3.8000007@linux.vnet.ibm.com>
References: <1241534358-32172-1-git-send-email-ehrhardt@linux.vnet.ibm.com>      <1241534358-32172-2-git-send-email-ehrhardt@linux.vnet.ibm.com>     <4A017C2E.6060306@redhat.com> <4A08217E.6000102@linux.vnet.ibm.com>    <4A0824EF.5010201@redhat.com> <4A082C4E.60501@linux.vnet.ibm.com>   <4A082FF3.4060908@redhat.com> <4A083956.2000200@linux.vnet.ibm.com>  <4A083DCC.8000805@redhat.com> <4A093E1B.9030008@linux.vnet.ibm.com> <4A095EEA.60902@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: kvm <kvm@vger.kernel.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Carsten Otte <cotte@de.ibm.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mtagate2.uk.ibm.com ([194.196.100.162]:48407 "EHLO
	mtagate2.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751512AbZELNd1 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 12 May 2009 09:33:27 -0400
Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225])
	by mtagate2.uk.ibm.com (8.13.1/8.13.1) with ESMTP id n4CDXRl5019213
	for <kvm@vger.kernel.org>; Tue, 12 May 2009 13:33:27 GMT
Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213])
	by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n4CDXRa61871942
	for <kvm@vger.kernel.org>; Tue, 12 May 2009 14:33:27 +0100
Received: from d06av03.portsmouth.uk.ibm.com (loopback [127.0.0.1])
	by d06av03.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n4CDXRf9006294
	for <kvm@vger.kernel.org>; Tue, 12 May 2009 14:33:27 +0100
In-Reply-To: <4A095EEA.60902@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>> Avi Kivity wrote:
>>> Christian Ehrhardt wrote:
>>>>
>>>> The bad thing on vcpu->request in that case is that I don't want=20
>>>> the async behaviour of vcpu->requests in that case, I want the=20
>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>
>>> You mean, the hardware can access the vcpu control block even when=20
>>> the vcpu is not running?=20
>> No, hardware only uses it with a running vcpu, but I realised my own=
=20
>> fault while changing the code to vcpu->request style.
>> For s390 I need to update the KVM->arch and *all*=20
>> vcpu->arch->sie_block... data synchronously.
>
> Out of interest, can you explain why?
Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's memory.=20
Therefore some important values like guest_origin and guest_memsize (on=
e=20
slot so it's just addr+size) are kept at VM level in kvm->arch.
b) We fortunately have cool hardware support for "nearly everything"(tm=
)=20
:-) In this case for example we set in vcpu->arch.sie_block the values=20
for origin and size translated into a "limit" to get memory management=20
virtualization support.
c) we have other code e.g. all our copy_from/to_guest stuff that uses=20
the kvm->arch values

If we would allow e.g. updates of a memslot (or as the patch supposes t=
o=20
harden the set_memory_region code against inconsiderate code changes in=
=20
other sections) it might happen that we set the kvm->arch information=20
but the vcpu->arch->sie_block stuff not until next reentry. Now=20
concurrently the running vcpu could cause some kind of fault that=20
involves a copy_from/to_guest. That way we could end up with potentiall=
y=20
invalid handling of that fault (fault handling and running guest would=20
use different userspace adresses until it is synced on next vcpu=20
reentry) - it's theoretical I know, but it might cause some issues that=
=20
would be hard to find.

On the other hand for the long term I wanted to note that all our=20
copy_from/to_guest functions is per vcpu, so when we some day implement=
=20
updateable memslots, multiple memslots or even just fill "free time"(tm=
)=20
and streamline our code we could redesign that origin/size storage. Thi=
s=20
could be done multiple ways, either just store it per vcpu or with a=20
lock for the kvm->arch level variables - both ways and maybe more could=
=20
then use the vcpu->request based approach, but unfortunately it's=20
neither part of that patch nor of the current effort to do that.

The really good thing is, because of our discussion about that I now=20
have a really detailed idea how I can improve that code aside from this=
=20
bugfix patch (lets hope not too far in the future).

>> That makes the "per vcpu resync on next entry" approach not feasible=
=2E
>>
>> On the other hand I realized at the same moment that the livelock=20
>> should be no issue for us, because as I mentioned:
>> a) only one memslot
>> b) a vcpu can't run without memslot
>> So I don't even need to kick out vcpu's, they just should not be=20
>> running.
>> Until we ever support multiple slots, or updates of the existing=20
>> single slot this should be ok, so is the bugfix patch this should be=
=2E
>> To avoid a theoretical deadlock in case other code is changing=20
>> (badly) it should be fair to aquire the lock with mutex_trylock and=20
>> return -EINVAL if we did not get all locks.
>
> OK.
>
>


--=20

Gr=C3=BCsse / regards,=20
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization