From: Avi Kivity <avi@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
Pavel Machek <pavel@ucw.cz>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, ngupta@vflare.org,
JBeulich@novell.com, chris.mason@oracle.com,
kurt.hackel@oracle.com, dave.mccracken@oracle.com,
npiggin@suse.de, akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Fri, 30 Apr 2010 21:08:33 +0300 [thread overview]
Message-ID: <4BDB1CA1.1000006@redhat.com> (raw)
In-Reply-To: <10e6761a-fb7a-421d-97fc-1f3b6cd94622@default>
On 04/30/2010 07:43 PM, Dan Magenheimer wrote:
>> Given that whenever frontswap fails you need to swap anyway, it is
>> better for the host to never fail a frontswap request and instead back
>> it with disk storage if needed. This way you avoid a pointless vmexit
>> when you're out of memory. Since it's disk backed it needs to be
>> asynchronous and batched.
>>
>> At this point we're back with the ordinary swap API. Simply have your
>> host expose a device which is write cached by host memory, you'll have
>> all the benefits of frontswap with none of the disadvantages, and with
>> no changes to guest .
>>
> I think you are making a number of possibly false assumptions here:
> 1) The host [the frontswap backend may not even be a hypervisor]
>
True. My remarks only apply to frontswap-to-hypervisor, for internally
consumed frontswap the situation is different.
> 2) can back it with disk storage [not if it is a bare-metal hypervisor]
>
So it seems a bare-metal hypervisor has less access to the bare metal
than a non-bare-metal hypervisor?
Seriously, leave the bare-metal FUD to Simon. People on this list know
that kvm and Xen have exactly the same access to the hardware (well
actually Xen needs to use privileged guests to access some of its hardware).
> 3) avoid a pointless vmexit [no vmexit for a non-VMX (e.g. PV) guest]
>
There's still an exit. It's much faster than a vmx/svm vmexit but still
nontrivial.
But why are we optimizing for 5 year old hardware?
> 4) when you're out of memory [how can this be determined outside of
> the hypervisor?]
>
It's determined by the hypervisor, same as with tmem. The guest swaps
to a virtual disk, the hypervisor places the data in RAM if it's
available, or on disk if it isn't. Write-back caching in all its glory.
> And, importantly, "have your host expose a device which is write
> cached by host memory"... you are implying that all guest swapping
> should be done to a device managed/controlled by the host? That
> eliminates guest swapping to directIO/SRIOV devices doesn't it?
>
You can have multiple swap devices.
wrt SR/IOV, you'll see synchronous frontswap reduce throughput. SR/IOV
will swap with <1 exit/page and DMA guest pages, while frontswap/tmem
will carry a 1 exit/page hit (even if no swap actually happens) and the
copy cost (if it does).
The API really, really wants to be asynchronous.
> Anyway, I think we can see now why frontswap might not be a good
> match for a hosted hypervisor (KVM), but that doesn't make it
> any less useful for a bare-metal hypervisor (or TBD for in-kernel
> compressed swap and TBD for possible future pseudo-RAM technologies).
>
In-kernel compressed swap does seem to be a good match for a synchronous
API. For future memory devices, or even bare-metal buzzword-compliant
hypervisors, I disagree. An asynchronous API is required for
efficiency, and they'll all have swap capability sooner or later (kvm,
vmware, and I believe xen 4 already do).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2010-04-30 18:11 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 13:42 Frontswap [PATCH 0/4] (was Transcendent Memory): overview Dan Magenheimer
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 16:13 ` Avi Kivity
2010-04-22 20:15 ` Dan Magenheimer
2010-04-23 9:48 ` Avi Kivity
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:57 ` Avi Kivity
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:52 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 16:26 ` Dan Magenheimer
2010-04-24 18:25 ` Avi Kivity
[not found] ` <1c02a94a-a6aa-4cbb-a2e6-9d4647760e91@default4BD43033.7090706@redhat.com>
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 12:06 ` Avi Kivity
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:18 ` Avi Kivity
2010-04-28 5:55 ` Pavel Machek
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 18:59 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-30 1:45 ` Dave Hansen
2010-04-30 7:13 ` Avi Kivity
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 16:08 ` Dave Hansen
2010-05-10 16:05 ` Martin Schwidefsky
2010-04-30 16:16 ` Avi Kivity
[not found] ` <4BDB18CE.2090608@goop.org4BDB2069.4000507@redhat.com>
[not found] ` <3a62a058-7976-48d7-acd2-8c6a8312f10f@default20100502071059.GF1790@ucw.cz>
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 17:10 ` Dave Hansen
2010-04-30 18:08 ` Avi Kivity [this message]
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-05-01 8:28 ` Avi Kivity
2010-05-01 17:10 ` Dan Magenheimer
2010-05-02 7:11 ` Pavel Machek
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 20:06 ` Pavel Machek
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:48 ` Avi Kivity
2010-05-02 17:22 ` Dan Magenheimer
2010-05-03 9:39 ` Avi Kivity
2010-05-03 14:59 ` Dan Magenheimer
2010-05-02 15:35 ` Avi Kivity
2010-05-02 17:06 ` Dan Magenheimer
2010-05-03 8:46 ` Avi Kivity
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 19:32 ` Pavel Machek
2010-04-30 16:04 ` Dave Hansen
2010-04-23 15:56 ` Dan Magenheimer
2010-04-24 18:22 ` Avi Kivity
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 12:11 ` Avi Kivity
[not found] ` <c5062f3a-3232-4b21-b032-2ee1f2485ff0@default4BD44E74.2020506@redhat.com>
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 14:15 ` Avi Kivity
2010-04-25 15:29 ` Dan Magenheimer
2010-04-26 6:01 ` Avi Kivity
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 13:48 ` Avi Kivity
2010-04-27 12:56 ` Pavel Machek
2010-04-27 14:32 ` Dan Magenheimer
2010-04-29 13:02 ` Pavel Machek
2010-04-27 11:52 ` Valdis.Kletnieks
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 12:55 ` Pavel Machek
2010-04-27 14:43 ` Nitin Gupta
2010-04-29 13:04 ` Pavel Machek
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 18:27 ` Avi Kivity
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 12:16 ` Avi Kivity
2010-04-25 16:05 ` Nitin Gupta
2010-04-26 6:06 ` Avi Kivity
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 13:43 ` Avi Kivity
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 9:21 ` Avi Kivity
2010-04-26 13:47 ` Nitin Gupta
2010-04-23 16:35 ` Jiahua
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BDB1CA1.1000006@redhat.com \
--to=avi@redhat.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave.mccracken@oracle.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jeremy@goop.org \
--cc=kurt.hackel@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
--cc=npiggin@suse.de \
--cc=pavel@ucw.cz \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).