From: Avi Kivity <avi@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, ngupta@vflare.org,
JBeulich@novell.com, chris.mason@oracle.com,
kurt.hackel@oracle.com, dave.mccracken@oracle.com,
npiggin@suse.de, akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sat, 24 Apr 2010 21:22:07 +0300 [thread overview]
Message-ID: <4BD336CF.1000103@redhat.com> (raw)
In-Reply-To: <b559c57a-0acb-4338-af21-dbfc3b3c0de5@default>
On 04/23/2010 06:56 PM, Dan Magenheimer wrote:
>>> Each page is either in frontswap OR on the normal swap device,
>>> never both. So, yes, both reads and writes are avoided if memory
>>> is available and there is no write issued to the io subsystem if
>>> memory is available. The is_memory_available decision is determined
>>> by the hypervisor dynamically for each page when the guest attempts
>>> a "frontswap_put". So, yes, you are indeed "swapping to the
>>> hypervisor" but, at least in the case of Xen, the hypervisor
>>> never swaps any memory to disk so there is never double swapping.
>>>
>> I see. So why not implement this as an ordinary swap device, with a
>> higher priority than the disk device? this way we reuse an API and
>> keep
>> things asynchronous, instead of introducing a special purpose API.
>>
> Because the swapping API doesn't adapt well to dynamic changes in
> the size and availability of the underlying "swap" device, which
> is very useful for swap to (bare-metal) hypervisor.
>
Can we extend it? Adding new APIs is easy, but harder to maintain in
the long term.
>> Doesn't this commit the hypervisor to retain this memory? If so, isn't
>> it simpler to give the page to the guest (so now it doesn't need to
>> swap at all)?
>>
> Yes the hypervisor is committed to retain the memory. In
> some ways, giving a page of memory to a guest (via ballooning)
> is simpler and in some ways not. When a guest "owns" a page,
> it can do whatever it wants with it, independent of what is best
> for the "whole" virtualized system. When the hypervisor
> "owns" the page on behalf of the guest but the guest can't
> directly address it, the hypervisor has more flexibility.
> For example, tmem optionally compresses all frontswap pages,
> effectively doubling the size of its available memory.
> In the future, knowing that a guest application can never
> access the pages directly, it might store all frontswap pages in
> (slower but still synchronous) phase change memory or "far NUMA"
> memory.
>
Ok. For non traditional RAM uses I really think an async API is
needed. If the API is backed by a cpu synchronous operation is fine,
but once it isn't RAM, it can be all kinds of interesting things.
Note that even if you do give the page to the guest, you still control
how it can access it, through the page tables. So for example you can
easily compress a guest's pages without telling it about it; whenever it
touches them you decompress them on the fly.
>> I think it will be true in an overwhelming number of cases. Flash is
>> new enough that most devices support scatter/gather.
>>
> I wasn't referring to hardware capability but to the availability
> and timing constraints of the pages that need to be swapped.
>
I have a feeling we're talking past each other here. Swap has no timing
constraints, it is asynchronous and usually to slow devices.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, ngupta@vflare.org,
JBeulich@novell.com, chris.mason@oracle.com,
kurt.hackel@oracle.com, dave.mccracken@oracle.com,
npiggin@suse.de, akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sat, 24 Apr 2010 21:22:07 +0300 [thread overview]
Message-ID: <4BD336CF.1000103@redhat.com> (raw)
In-Reply-To: <b559c57a-0acb-4338-af21-dbfc3b3c0de5@default>
On 04/23/2010 06:56 PM, Dan Magenheimer wrote:
>>> Each page is either in frontswap OR on the normal swap device,
>>> never both. So, yes, both reads and writes are avoided if memory
>>> is available and there is no write issued to the io subsystem if
>>> memory is available. The is_memory_available decision is determined
>>> by the hypervisor dynamically for each page when the guest attempts
>>> a "frontswap_put". So, yes, you are indeed "swapping to the
>>> hypervisor" but, at least in the case of Xen, the hypervisor
>>> never swaps any memory to disk so there is never double swapping.
>>>
>> I see. So why not implement this as an ordinary swap device, with a
>> higher priority than the disk device? this way we reuse an API and
>> keep
>> things asynchronous, instead of introducing a special purpose API.
>>
> Because the swapping API doesn't adapt well to dynamic changes in
> the size and availability of the underlying "swap" device, which
> is very useful for swap to (bare-metal) hypervisor.
>
Can we extend it? Adding new APIs is easy, but harder to maintain in
the long term.
>> Doesn't this commit the hypervisor to retain this memory? If so, isn't
>> it simpler to give the page to the guest (so now it doesn't need to
>> swap at all)?
>>
> Yes the hypervisor is committed to retain the memory. In
> some ways, giving a page of memory to a guest (via ballooning)
> is simpler and in some ways not. When a guest "owns" a page,
> it can do whatever it wants with it, independent of what is best
> for the "whole" virtualized system. When the hypervisor
> "owns" the page on behalf of the guest but the guest can't
> directly address it, the hypervisor has more flexibility.
> For example, tmem optionally compresses all frontswap pages,
> effectively doubling the size of its available memory.
> In the future, knowing that a guest application can never
> access the pages directly, it might store all frontswap pages in
> (slower but still synchronous) phase change memory or "far NUMA"
> memory.
>
Ok. For non traditional RAM uses I really think an async API is
needed. If the API is backed by a cpu synchronous operation is fine,
but once it isn't RAM, it can be all kinds of interesting things.
Note that even if you do give the page to the guest, you still control
how it can access it, through the page tables. So for example you can
easily compress a guest's pages without telling it about it; whenever it
touches them you decompress them on the fly.
>> I think it will be true in an overwhelming number of cases. Flash is
>> new enough that most devices support scatter/gather.
>>
> I wasn't referring to hardware capability but to the availability
> and timing constraints of the pages that need to be swapped.
>
I have a feeling we're talking past each other here. Swap has no timing
constraints, it is asynchronous and usually to slow devices.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-24 18:22 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 13:42 Frontswap [PATCH 0/4] (was Transcendent Memory): overview Dan Magenheimer
2010-04-22 13:42 ` Dan Magenheimer
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 16:13 ` Avi Kivity
2010-04-22 16:13 ` Avi Kivity
2010-04-22 20:15 ` Dan Magenheimer
2010-04-22 20:15 ` Dan Magenheimer
2010-04-23 9:48 ` Avi Kivity
2010-04-23 9:48 ` Avi Kivity
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:57 ` Avi Kivity
2010-04-23 13:57 ` Avi Kivity
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:52 ` Avi Kivity
2010-04-23 14:52 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 16:26 ` Dan Magenheimer
2010-04-23 16:26 ` Dan Magenheimer
2010-04-24 18:25 ` Avi Kivity
2010-04-24 18:25 ` Avi Kivity
[not found] ` <1c02a94a-a6aa-4cbb-a2e6-9d4647760e91@default4BD43033.7090706@redhat.com>
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 12:06 ` Avi Kivity
2010-04-25 12:06 ` Avi Kivity
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:18 ` Avi Kivity
2010-04-25 13:18 ` Avi Kivity
2010-04-28 5:55 ` Pavel Machek
2010-04-28 5:55 ` Pavel Machek
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 18:59 ` Avi Kivity
2010-04-29 18:59 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-30 1:45 ` Dave Hansen
2010-04-30 1:45 ` Dave Hansen
2010-04-30 7:13 ` Avi Kivity
2010-04-30 7:13 ` Avi Kivity
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 16:08 ` Dave Hansen
2010-04-30 16:08 ` Dave Hansen
2010-05-10 16:05 ` Martin Schwidefsky
2010-05-10 16:05 ` Martin Schwidefsky
2010-04-30 16:16 ` Avi Kivity
2010-04-30 16:16 ` Avi Kivity
[not found] ` <4BDB18CE.2090608@goop.org4BDB2069.4000507@redhat.com>
[not found] ` <3a62a058-7976-48d7-acd2-8c6a8312f10f@default20100502071059.GF1790@ucw.cz>
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 17:10 ` Dave Hansen
2010-04-30 17:10 ` Dave Hansen
2010-04-30 18:08 ` Avi Kivity
2010-04-30 18:08 ` Avi Kivity
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-05-01 8:28 ` Avi Kivity
2010-05-01 8:28 ` Avi Kivity
2010-05-01 17:10 ` Dan Magenheimer
2010-05-01 17:10 ` Dan Magenheimer
2010-05-02 7:11 ` Pavel Machek
2010-05-02 7:11 ` Pavel Machek
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 20:06 ` Pavel Machek
2010-05-02 20:06 ` Pavel Machek
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:48 ` Avi Kivity
2010-05-02 16:48 ` Avi Kivity
2010-05-02 17:22 ` Dan Magenheimer
2010-05-02 17:22 ` Dan Magenheimer
2010-05-03 9:39 ` Avi Kivity
2010-05-03 9:39 ` Avi Kivity
2010-05-03 14:59 ` Dan Magenheimer
2010-05-03 14:59 ` Dan Magenheimer
2010-05-02 15:35 ` Avi Kivity
2010-05-02 15:35 ` Avi Kivity
2010-05-02 17:06 ` Dan Magenheimer
2010-05-02 17:06 ` Dan Magenheimer
2010-05-03 8:46 ` Avi Kivity
2010-05-03 8:46 ` Avi Kivity
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 19:32 ` Pavel Machek
2010-05-03 19:32 ` Pavel Machek
2010-04-30 16:04 ` Dave Hansen
2010-04-30 16:04 ` Dave Hansen
2010-04-23 15:56 ` Dan Magenheimer
2010-04-23 15:56 ` Dan Magenheimer
2010-04-24 18:22 ` Avi Kivity [this message]
2010-04-24 18:22 ` Avi Kivity
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 12:11 ` Avi Kivity
2010-04-25 12:11 ` Avi Kivity
[not found] ` <c5062f3a-3232-4b21-b032-2ee1f2485ff0@default4BD44E74.2020506@redhat.com>
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 14:15 ` Avi Kivity
2010-04-25 14:15 ` Avi Kivity
2010-04-25 15:29 ` Dan Magenheimer
2010-04-25 15:29 ` Dan Magenheimer
2010-04-26 6:01 ` Avi Kivity
2010-04-26 6:01 ` Avi Kivity
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 13:48 ` Avi Kivity
2010-04-26 13:48 ` Avi Kivity
2010-04-27 12:56 ` Pavel Machek
2010-04-27 12:56 ` Pavel Machek
2010-04-27 14:32 ` Dan Magenheimer
2010-04-27 14:32 ` Dan Magenheimer
2010-04-29 13:02 ` Pavel Machek
2010-04-29 13:02 ` Pavel Machek
2010-04-27 11:52 ` Valdis.Kletnieks
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 12:55 ` Pavel Machek
2010-04-27 12:55 ` Pavel Machek
2010-04-27 14:43 ` Nitin Gupta
2010-04-27 14:43 ` Nitin Gupta
2010-04-29 13:04 ` Pavel Machek
2010-04-29 13:04 ` Pavel Machek
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 18:27 ` Avi Kivity
2010-04-24 18:27 ` Avi Kivity
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 12:16 ` Avi Kivity
2010-04-25 12:16 ` Avi Kivity
2010-04-25 16:05 ` Nitin Gupta
2010-04-25 16:05 ` Nitin Gupta
2010-04-26 6:06 ` Avi Kivity
2010-04-26 6:06 ` Avi Kivity
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 13:43 ` Avi Kivity
2010-04-26 13:43 ` Avi Kivity
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 9:21 ` Avi Kivity
2010-04-27 9:21 ` Avi Kivity
2010-04-26 13:47 ` Nitin Gupta
2010-04-26 13:47 ` Nitin Gupta
2010-04-23 16:35 ` Jiahua
2010-04-23 16:35 ` Jiahua
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD336CF.1000103@redhat.com \
--to=avi@redhat.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave.mccracken@oracle.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jeremy@goop.org \
--cc=kurt.hackel@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.