From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
Date: Mon, 18 May 2009 01:31:50 +0300
Message-ID: <4A109056.2010003@redhat.com>
References: <1241534358-32172-1-git-send-email-ehrhardt@linux.vnet.ibm.com>      <1241534358-32172-2-git-send-email-ehrhardt@linux.vnet.ibm.com>     <4A017C2E.6060306@redhat.com> <4A08217E.6000102@linux.vnet.ibm.com>    <4A0824EF.5010201@redhat.com> <4A082C4E.60501@linux.vnet.ibm.com>   <4A082FF3.4060908@redhat.com> <4A083956.2000200@linux.vnet.ibm.com>  <4A083DCC.8000805@redhat.com> <4A093E1B.9030008@linux.vnet.ibm.com> <4A095EEA.60902@redhat.com> <4A097AA3.8000007@linux.vnet.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm <kvm@vger.kernel.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Carsten Otte <cotte@de.ibm.com>
To: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:42193 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754732AbZEQWbu (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 17 May 2009 18:31:50 -0400
In-Reply-To: <4A097AA3.8000007@linux.vnet.ibm.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Christian Ehrhardt wrote:
>>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>>
>>>> You mean, the hardware can access the vcpu control block even when 
>>>> the vcpu is not running? 
>>> No, hardware only uses it with a running vcpu, but I realised my own 
>>> fault while changing the code to vcpu->request style.
>>> For s390 I need to update the KVM->arch and *all* 
>>> vcpu->arch->sie_block... data synchronously.
>>
>> Out of interest, can you explain why?
> Sure I'll try to give an example.
>
> a) The whole guest has "one" memory slot representing all it's memory. 
> Therefore some important values like guest_origin and guest_memsize 
> (one slot so it's just addr+size) are kept at VM level in kvm->arch.

It should really be kept in kvm->memslots[0]->{userspace_addr, npages}.  
This is common to all architectures.

> b) We fortunately have cool hardware support for "nearly 
> everything"(tm) :-) In this case for example we set in 
> vcpu->arch.sie_block the values for origin and size translated into a 
> "limit" to get memory management virtualization support.

x86 has something analogous; shadow or nested page tables are also 
per-vcpu and accessed by the hardware while the guest is running.

> c) we have other code e.g. all our copy_from/to_guest stuff that uses 
> the kvm->arch values

You want to drop these and use kvm_read_guest() / kvm_write_guest().

> If we would allow e.g. updates of a memslot (or as the patch supposes 
> to harden the set_memory_region code against inconsiderate code 
> changes in other sections) it might happen that we set the kvm->arch 
> information but the vcpu->arch->sie_block stuff not until next 
> reentry. Now concurrently the running vcpu could cause some kind of 
> fault that involves a copy_from/to_guest. That way we could end up 
> with potentially invalid handling of that fault (fault handling and 
> running guest would use different userspace adresses until it is 
> synced on next vcpu reentry) - it's theoretical I know, but it might 
> cause some issues that would be hard to find.

I agree it should be protected.  Here's how we do it in arch-independent 
code:

- code that looks up memory slots takes slots_lock for read (future: rcu)
- code that changes memory slots takes slots_lock for write, and 
requests an mmu reload (includes an IPI to force the vcpu out of guest mode)

Now, once we begin changing a slot no one can touch memory or reenter 
the guest until we are done.

> On the other hand for the long term I wanted to note that all our 
> copy_from/to_guest functions is per vcpu, so when we some day 
> implement updateable memslots, multiple memslots or even just fill 
> "free time"(tm) and streamline our code we could redesign that 
> origin/size storage. This could be done multiple ways, either just 
> store it per vcpu or with a lock for the kvm->arch level variables - 
> both ways and maybe more could then use the vcpu->request based 
> approach, but unfortunately it's neither part of that patch nor of the 
> current effort to do that.

I think we should keep that in generic code.  All of that applies to x86 
(and ia64 and ppc), if I understand you correctly, and if I understand 
the other archs correctly (don't place a large bet).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.