All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@qumranet.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shaohua Li <shaohua.li@intel.com>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
Date: Tue, 24 Jul 2007 09:21:41 +0300	[thread overview]
Message-ID: <46A59A75.8050501@qumranet.com> (raw)
In-Reply-To: <1185257474.1803.216.camel@localhost.localdomain>

Rusty Russell wrote:
> On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> Having an address_space (like your patch does) is remarkably simple, and 
>>>> requires few hooks from the current vm.  However using existing vmas 
>>>> mapped by the user has many advantages:
>>>>
>>>> - compatible with s390 requirements
>>>> - allows the user to use hugetlbfs pages, which have a performance 
>>>> advantage using ept/npt (but which are unswappable)
>>>> - allows the user to map a file (which can be regarded as way to specify 
>>>> the swap device)
>>>> - better ingration with the rest of the vm
>>>>     
>>>>         
>>> You don't need to expose the vmas.  You just have userspace point out
>>> the start+len of each region of memory it wants the guest to be able to
>>> access, and the address it wants it to appear in the guest.
>>>
>>> This is a slight superset of what lguest does in two ways:
>>>
>>> 1) my guest address == user address, but I'm looking at adding an offset
>>> so I don't have to link the launcher binary specially.
>>> 2) I have only one contiguous region of guest-physical memory, since I
>>> can place device memory immediately above "normal" mem.
>>>
>>>   
>>>       
>> My intent was to allow userspace to establish assign a virtual address
>> range into a memory slot.
>>
>> So long as you don't do swapping, all is simple, since you can do a
>> get_user_pages() on initialization or when installing a shadow pte.  But
>> if you want to swap, you need:
>>
>> - a way to transfer the dirty bit from the shadow ptes to the struct page
>>     
>
> Actually, get_user_pages() does that for you.  You have to make R/O any
> writable pte where the guest doesn't set the dirty bit (so you can trap
> it later) but last I put a printk in there, Linux doesn't do that.
>
>   

Don't understand.  You mean Linux always sets the dirty bit when it
makes a page writable?  Surely some mistake.

It probably does do so on demand write faults, but I'm sure the dirty
bit can get cleaned out by the swapper.

>> - a way to let the vm rmap know that there are shadow ptes that point to
>> the page in addition to Linux ptes.  These shadow ptes may be in a
>> different format than Linux ptes.
>> - a different tlb invalidation method with ASIDs
>>     
>
> Well first I was just going to see how well hooking into the shrinker
> works.  That might be sufficient: just throw out shadow refs to pages
> when there's pressure.
>   

Ah, interesting.  Yes, you trim the shadow page table cache which unrefs
pages for you.

Maybe that's a good way to get things started.

> If not, it does get harder.  A callback in the mm struct to say "I want
> to swap your page out" is required if we don't take a reference to the
> page.  Dirty bit handling would be an interesting issue (maybe the
> callback can say "No!" and dirty the page again?).
>   

Since we have rmap, I don't see that as an issue.  Given a page, we can
easily drop all refs.  Though lguest doesn't do that, right?

I'm also concerned with picking the correct page, but there's no good
solution here.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel
	<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
	lkml <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFC 0/8]KVM: swap out guest pages
Date: Tue, 24 Jul 2007 09:21:41 +0300	[thread overview]
Message-ID: <46A59A75.8050501@qumranet.com> (raw)
In-Reply-To: <1185257474.1803.216.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

Rusty Russell wrote:
> On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> Having an address_space (like your patch does) is remarkably simple, and 
>>>> requires few hooks from the current vm.  However using existing vmas 
>>>> mapped by the user has many advantages:
>>>>
>>>> - compatible with s390 requirements
>>>> - allows the user to use hugetlbfs pages, which have a performance 
>>>> advantage using ept/npt (but which are unswappable)
>>>> - allows the user to map a file (which can be regarded as way to specify 
>>>> the swap device)
>>>> - better ingration with the rest of the vm
>>>>     
>>>>         
>>> You don't need to expose the vmas.  You just have userspace point out
>>> the start+len of each region of memory it wants the guest to be able to
>>> access, and the address it wants it to appear in the guest.
>>>
>>> This is a slight superset of what lguest does in two ways:
>>>
>>> 1) my guest address == user address, but I'm looking at adding an offset
>>> so I don't have to link the launcher binary specially.
>>> 2) I have only one contiguous region of guest-physical memory, since I
>>> can place device memory immediately above "normal" mem.
>>>
>>>   
>>>       
>> My intent was to allow userspace to establish assign a virtual address
>> range into a memory slot.
>>
>> So long as you don't do swapping, all is simple, since you can do a
>> get_user_pages() on initialization or when installing a shadow pte.  But
>> if you want to swap, you need:
>>
>> - a way to transfer the dirty bit from the shadow ptes to the struct page
>>     
>
> Actually, get_user_pages() does that for you.  You have to make R/O any
> writable pte where the guest doesn't set the dirty bit (so you can trap
> it later) but last I put a printk in there, Linux doesn't do that.
>
>   

Don't understand.  You mean Linux always sets the dirty bit when it
makes a page writable?  Surely some mistake.

It probably does do so on demand write faults, but I'm sure the dirty
bit can get cleaned out by the swapper.

>> - a way to let the vm rmap know that there are shadow ptes that point to
>> the page in addition to Linux ptes.  These shadow ptes may be in a
>> different format than Linux ptes.
>> - a different tlb invalidation method with ASIDs
>>     
>
> Well first I was just going to see how well hooking into the shrinker
> works.  That might be sufficient: just throw out shadow refs to pages
> when there's pressure.
>   

Ah, interesting.  Yes, you trim the shadow page table cache which unrefs
pages for you.

Maybe that's a good way to get things started.

> If not, it does get harder.  A callback in the mm struct to say "I want
> to swap your page out" is required if we don't take a reference to the
> page.  Dirty bit handling would be an interesting issue (maybe the
> callback can say "No!" and dirty the page again?).
>   

Since we have rmap, I don't see that as an issue.  Given a page, we can
easily drop all refs.  Though lguest doesn't do that, right?

I'm also concerned with picking the correct page, but there's no good
solution here.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

  reply	other threads:[~2007-07-24  6:22 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-23  6:51 [RFC 0/8]KVM: swap out guest pages Shaohua Li
2007-07-23  6:51 ` Shaohua Li
2007-07-23 10:27 ` Avi Kivity
2007-07-23 10:27   ` Avi Kivity
2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
2007-07-23 12:25     ` Christoph Hellwig
2007-07-23 12:29     ` [kvm-devel] " Avi Kivity
2007-07-23 12:29       ` Avi Kivity
2007-07-23 12:34       ` [kvm-devel] " Christoph Hellwig
2007-07-23 12:34         ` Christoph Hellwig
2007-07-23 12:39         ` [kvm-devel] " Avi Kivity
2007-07-23 12:39           ` Avi Kivity
2007-07-24  2:00         ` [kvm-devel] " Shaohua Li
2007-07-24  2:00           ` Shaohua Li
2007-07-23 20:06   ` Jeff Dike
2007-07-23 20:06     ` Jeff Dike
2007-07-24  5:22     ` Avi Kivity
2007-07-24  5:22       ` Avi Kivity
2007-07-25 16:15       ` Jeff Dike
2007-07-25 16:15         ` Jeff Dike
2007-07-25 17:12         ` [kvm-devel] " Carsten Otte
2007-07-25 17:12           ` Carsten Otte
2007-07-23 23:10   ` [kvm-devel] " Rusty Russell
2007-07-23 23:10     ` Rusty Russell
2007-07-24  5:30     ` [kvm-devel] " Avi Kivity
2007-07-24  6:11       ` Rusty Russell
2007-07-24  6:11         ` Rusty Russell
2007-07-24  6:21         ` Avi Kivity [this message]
2007-07-24  6:21           ` Avi Kivity
2007-07-24  6:45           ` [kvm-devel] " Rusty Russell
2007-07-24  6:45             ` Rusty Russell
2007-07-24  6:59             ` [kvm-devel] " Avi Kivity
2007-07-24  6:59               ` Avi Kivity
2007-07-24  7:17               ` [kvm-devel] " Rusty Russell
2007-07-24  7:17                 ` Rusty Russell
2007-07-24  1:42   ` Shaohua Li
2007-07-24  1:42     ` Shaohua Li
2007-07-24  5:42     ` Avi Kivity
2007-07-24  5:42       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46A59A75.8050501@qumranet.com \
    --to=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.