Re: [PATCH 6/13] KVM: memory slot management

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@qumranet.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linux-kernel@vger.kernel.org, kvm-devel@lists.sourceforge.net
Subject: Re: [PATCH 6/13] KVM: memory slot management
Date: Fri, 27 Oct 2006 15:26:03 +0200	[thread overview]
Message-ID: <454208EB.7080007@qumranet.com> (raw)
In-Reply-To: <200610270937.11646.arnd@arndb.de>

Arnd Bergmann wrote:
> On Friday 27 October 2006 07:47, Avi Kivity wrote:
>   
>> Arnd Bergmann wrote:
>>     
>>> - no need to preallocate memory that the guest doesn't actually use.
>>>   
>>>       
>> Well, a fully vitrualized guest will likely use all the memory it gets.  
>> Linux certainly will.
>>     
>
> Only if it does lots of disk accesses that load stuff into
> page/inode/dentry cache. Single-application guests don't necessarily
> do that.
>
>   

Okay.  FWIW, you can demand allocate with other schemes as well.

>>> - guest memory can be paged to disk.
>>> - you can mmap files into multiple guest for fast communication
>>> - you can use mmap host files as backing store for guest blockdevices,
>>>   including ext2 with the -o xip mount option to avoid double paging
>>>   
>>>       
>> What do you mean exactly? to respond to a block device read by mmap()ing 
>> the backing file into the pages the host requested?
>>
>> (e.g. turn a host bio read into a guest mmap)
>>     
>
> The idea would be to mmap the file into the guest real address space.
> With -o xip, the page cache for the virtual device would basically
> reside in that high address range.
>   

Ah, I see what you mean now.  Like the "memory technology device" thing.



> Guest users reading/writing files on it cause a memcopy between guest
> user space and the host file mapping, done by the guest file system
> implementation.
>
> The interesting point here is how to handle a host page fault on the
> file mapping. The solution on z/VM for this is to generate a special
> exception for this that will be caught by the guest kernel, telling
> it to wait until the page is there. The guest kernel can then put the
> current thread to sleep and do something else, until a second exception
> tells it that the page has been loaded by the host. The guest then
> wakes up the sleeping thread.
>
> This can work the same way for host file backed (guest block device)
> and host anonymous (guest RAM) memory.
>
>   

Certainly something like that can be done, for paravirtualized guests.

>> If we allow the pages to be writable, the guest could write into the 
>> virtual block device just by modifying a read page (which might have be 
>> discarded and no longer related to the block device)
>>     
>
> In your virtual mmu (or nested page table), you need to make sure that
> the page is mapped with the intersection of the guest vm_prot and host
> vm_prot into guest users.
>
>   

Yes.  My comment was based on an incorrect understanding of your suggestion.

>> 2. The next mmu implementation, which caches guest translations.
>>
>> The potential problem above now becomes acute.  The guest will have 
>> kernel mappings for every page, and after a short while they'll all be 
>> faulted in and locked.  This defeats the swap integration which is IMO a 
>> very strong point.
>>
>> We can work around that by periodically forcing out translations (some 
>> kind of clock algorithm) at some rate so the host vm can have a go at 
>> them.  That can turn out to be expensive as we'll need to interrupt all 
>> running vcpus to flush (real) tlb entries.
>>     
>
> Don't understand. Can't one CPU cause a TLB entry to be flushed on all
> CPUs?
>
>   

It's not about tlb entries.  The shadow page tables collaples a GV -> HV 
-> HP  double translation into a GV -> HP page table.  When the Linux vm 
goes around evicting pages, it invalidates those mappings.

There are two solutions possible: lock pages which participate in these 
translations (and their number can be large) or modify the Linux vm to 
consult a reverse mapping and remove the translations (in which case TLB 
entries need to be removed).

>>   b.  we need to hide the userspace portion of the monitor from the 
>> guest physical address space
>>     
>
> That depends on your trust model. You could simply say that you expect
> the guest real mode to have the same privileges as the host application
> (your monitor), and not care if a guest can shoot itself in the foot
> by overwriting the monitor.
>   

It can shoot not only its foot, but anything the monitor's uid has 
access to.  Host files, the host network, other guests belonging to the 
user, etc.

>>   c.  we need to extend host tlb invalidations to invalidate tlbs on guests
>>     
>
> I don't understand much about the x86 specific memory management,
> but shouldn't a TLB invalidate of a given page do the right thing
> on all CPUs, even if they are currently running a guest?
>   
It's worse than I thouht: tlb entries generated by guest accesses are 
tagged with the guest virtual address, to if you remove a guest 
physical/host virtual page you need to invalidate the entire guest tlb.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

next prev parent reply	other threads:[~2006-10-27 13:26 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-26 17:19 [PATCH 0/13] KVM: Kernel-based Virtual Machine (v3) Avi Kivity
2006-10-26 17:22 ` [PATCH 1/13] KVM: userspace interface Avi Kivity
     [not found]   ` <200610270051.43477.arnd@arndb.de>
2006-10-27  5:51     ` Avi Kivity
2006-10-26 17:23 ` [PATCH 2/13] KVM: Intel virtual mode extensions definitions Avi Kivity
2006-10-26 17:24 ` [PATCH 3/13] KVM: kvm data structures Avi Kivity
2006-10-26 22:55   ` Arnd Bergmann
2006-10-27  5:53     ` Avi Kivity
2006-10-27  7:39       ` Arnd Bergmann
2006-10-26 17:25 ` [PATCH 4/13] KVM: random accessors and constants Avi Kivity
2006-10-26 17:26 ` [PATCH 5/13] KVM: virtualization infrastructure Avi Kivity
2006-10-26 17:27 ` [PATCH 6/13] KVM: memory slot management Avi Kivity
2006-10-26 22:44   ` Arnd Bergmann
2006-10-27  5:47     ` Avi Kivity
2006-10-27  7:37       ` Arnd Bergmann
2006-10-27 13:26         ` Avi Kivity [this message]
2006-10-27 14:05           ` Arnd Bergmann
2006-10-29  9:10             ` Avi Kivity
2006-10-27 15:43           ` [kvm-devel] " Anthony Liguori
2006-10-29  9:15             ` Avi Kivity
2006-10-26 17:28 ` [PATCH 7/13] KVM: vcpu creation and maintenance Avi Kivity
2006-10-26 17:29 ` [PATCH 8/13] KVM: vcpu execution loop Avi Kivity
2006-10-26 17:30 ` [PATCH 9/13] KVM: define exit handlers Avi Kivity
2006-10-26 17:31 ` [PATCH 10/13] KVM: less common " Avi Kivity
2006-10-26 17:32 ` [PATCH 11/13] KVM: mmu Avi Kivity
2006-10-26 17:33 ` [PATCH 12/13] KVM: x86 emulator Avi Kivity
2006-10-26 17:34 ` [PATCH 13/13] KVM: plumbing Avi Kivity
  -- strict thread matches above, loose matches on Subject: below --
2006-10-23 13:28 [PATCH 0/7] KVM: Kernel-based Virtual Machine (v2) Avi Kivity
2006-10-23 13:30 ` [PATCH 6/13] KVM: memory slot management Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=454208EB.7080007@qumranet.com \
    --to=avi@qumranet.com \
    --cc=arnd@arndb.de \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox