Re: [PATCH 6/13] KVM: memory slot management

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@qumranet.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linux-kernel@vger.kernel.org, kvm-devel@lists.sourceforge.net
Subject: Re: [PATCH 6/13] KVM: memory slot management
Date: Fri, 27 Oct 2006 15:26:03 +0200	[thread overview]
Message-ID: <454208EB.7080007@qumranet.com> (raw)
In-Reply-To: <200610270937.11646.arnd@arndb.de>

Arnd Bergmann wrote:
> On Friday 27 October 2006 07:47, Avi Kivity wrote:
>   
>> Arnd Bergmann wrote:
>>     
>>> - no need to preallocate memory that the guest doesn't actually use.
>>>   
>>>       
>> Well, a fully vitrualized guest will likely use all the memory it gets.  
>> Linux certainly will.
>>     
>
> Only if it does lots of disk accesses that load stuff into
> page/inode/dentry cache. Single-application guests don't necessarily
> do that.
>
>   

Okay.  FWIW, you can demand allocate with other schemes as well.

>>> - guest memory can be paged to disk.
>>> - you can mmap files into multiple guest for fast communication
>>> - you can use mmap host files as backing store for guest blockdevices,
>>>   including ext2 with the -o xip mount option to avoid double paging
>>>   
>>>       
>> What do you mean exactly? to respond to a block device read by mmap()ing 
>> the backing file into the pages the host requested?
>>
>> (e.g. turn a host bio read into a guest mmap)
>>     
>
> The idea would be to mmap the file into the guest real address space.
> With -o xip, the page cache for the virtual device would basically
> reside in that high address range.
>   

Ah, I see what you mean now.  Like the "memory technology device" thing.



> Guest users reading/writing files on it cause a memcopy between guest
> user space and the host file mapping, done by the guest file system
> implementation.
>
> The interesting point here is how to handle a host page fault on the
> file mapping. The solution on z/VM for this is to generate a special
> exception for this that will be caught by the guest kernel, telling
> it to wait until the page is there. The guest kernel can then put the
> current thread to sleep and do something else, until a second exception
> tells it that the page has been loaded by the host. The guest then
> wakes up the sleeping thread.
>
> This can work the same way for host file backed (guest block device)
> and host anonymous (guest RAM) memory.
>
>   

Certainly something like that can be done, for paravirtualized guests.

>> If we allow the pages to be writable, the guest could write into the 
>> virtual block device just by modifying a read page (which might have be 
>> discarded and no longer related to the block device)
>>     
>
> In your virtual mmu (or nested page table), you need to make sure that
> the page is mapped with the intersection of the guest vm_prot and host
> vm_prot into guest users.
>
>   

Yes.  My comment was based on an incorrect understanding of your suggestion.

>> 2. The next mmu implementation, which caches guest translations.
>>
>> The potential problem above now becomes acute.  The guest will have 
>> kernel mappings for every page, and after a short while they'll all be 
>> faulted in and locked.  This defeats the swap integration which is IMO a 
>> very strong point.
>>
>> We can work around that by periodically forcing out translations (some 
>> kind of clock algorithm) at some rate so the host vm can have a go at 
>> them.  That can turn out to be expensive as we'll need to interrupt all 
>> running vcpus to flush (real) tlb entries.
>>     
>
> Don't understand. Can't one CPU cause a TLB entry to be flushed on all
> CPUs?
>
>   

It's not about tlb entries.  The shadow page tables collaples a GV -> HV 
-> HP  double translation into a GV -> HP page table.  When the Linux vm 
goes around evicting pages, it invalidates those mappings.

There are two solutions possible: lock pages which participate in these 
translations (and their number can be large) or modify the Linux vm to 
consult a reverse mapping and remove the translations (in which case TLB 
entries need to be removed).

>>   b.  we need to hide the userspace portion of the monitor from the 
>> guest physical address space
>>     
>
> That depends on your trust model. You could simply say that you expect
> the guest real mode to have the same privileges as the host application
> (your monitor), and not care if a guest can shoot itself in the foot
> by overwriting the monitor.
>   

It can shoot not only its foot, but anything the monitor's uid has 
access to.  Host files, the host network, other guests belonging to the 
user, etc.

>>   c.  we need to extend host tlb invalidations to invalidate tlbs on guests
>>     
>
> I don't understand much about the x86 specific memory management,
> but shouldn't a TLB invalidate of a given page do the right thing
> on all CPUs, even if they are currently running a guest?
>   
It's worse than I thouht: tlb entries generated by guest accesses are 
tagged with the guest virtual address, to if you remove a guest 
physical/host virtual page you need to invalidate the entire guest tlb.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

next prev parent reply	other threads:[~2006-10-27 13:26 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-26 17:19 [PATCH 0/13] KVM: Kernel-based Virtual Machine (v3) Avi Kivity
2006-10-26 17:22 ` [PATCH 1/13] KVM: userspace interface Avi Kivity
     [not found]   ` <200610270051.43477.arnd@arndb.de>
2006-10-27  5:51     ` Avi Kivity
2006-10-26 17:23 ` [PATCH 2/13] KVM: Intel virtual mode extensions definitions Avi Kivity
2006-10-26 17:24 ` [PATCH 3/13] KVM: kvm data structures Avi Kivity
2006-10-26 22:55   ` Arnd Bergmann
2006-10-27  5:53     ` Avi Kivity
2006-10-27  7:39       ` Arnd Bergmann
2006-10-26 17:25 ` [PATCH 4/13] KVM: random accessors and constants Avi Kivity
2006-10-26 17:26 ` [PATCH 5/13] KVM: virtualization infrastructure Avi Kivity
2006-10-26 17:27 ` [PATCH 6/13] KVM: memory slot management Avi Kivity
2006-10-26 22:44   ` Arnd Bergmann
2006-10-27  5:47     ` Avi Kivity
2006-10-27  7:37       ` Arnd Bergmann
2006-10-27 13:26         ` Avi Kivity [this message]
2006-10-27 14:05           ` Arnd Bergmann
2006-10-29  9:10             ` Avi Kivity
2006-10-29  9:10               ` Avi Kivity
2006-10-27 15:43           ` [kvm-devel] " Anthony Liguori
2006-10-29  9:15             ` Avi Kivity
2006-10-29  9:15               ` Avi Kivity
2006-10-26 17:28 ` [PATCH 7/13] KVM: vcpu creation and maintenance Avi Kivity
2006-10-26 17:29 ` [PATCH 8/13] KVM: vcpu execution loop Avi Kivity
2006-10-26 17:30 ` [PATCH 9/13] KVM: define exit handlers Avi Kivity
2006-10-26 17:31 ` [PATCH 10/13] KVM: less common " Avi Kivity
2006-10-26 17:32 ` [PATCH 11/13] KVM: mmu Avi Kivity
2006-10-26 17:33 ` [PATCH 12/13] KVM: x86 emulator Avi Kivity
2006-10-26 17:34 ` [PATCH 13/13] KVM: plumbing Avi Kivity
  -- strict thread matches above, loose matches on Subject: below --
2006-10-23 13:28 [PATCH 0/7] KVM: Kernel-based Virtual Machine (v2) Avi Kivity
2006-10-23 13:30 ` [PATCH 6/13] KVM: memory slot management Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=454208EB.7080007@qumranet.com \
    --to=avi@qumranet.com \
    --cc=arnd@arndb.de \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.