From: Avi Kivity <avi@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, ngupta@vflare.org,
JBeulich@novell.com, chris.mason@oracle.com,
kurt.hackel@oracle.com, dave.mccracken@oracle.com,
npiggin@suse.de, akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sun, 25 Apr 2010 15:11:46 +0300 [thread overview]
Message-ID: <4BD43182.1040508@redhat.com> (raw)
In-Reply-To: <d1bb78ca-5ef6-4a8d-af79-a265f2d4339c@default>
On 04/25/2010 03:30 AM, Dan Magenheimer wrote:
>>>> I see. So why not implement this as an ordinary swap device, with a
>>>> higher priority than the disk device? this way we reuse an API and
>>>> keep
>>>> things asynchronous, instead of introducing a special purpose API.
>>>>
>>>>
>>> Because the swapping API doesn't adapt well to dynamic changes in
>>> the size and availability of the underlying "swap" device, which
>>> is very useful for swap to (bare-metal) hypervisor.
>>>
>> Can we extend it? Adding new APIs is easy, but harder to maintain in
>> the long term.
>>
> Umm... I think the difference between a "new" API and extending
> an existing one here is a choice of semantics. As designed, frontswap
> is an extremely simple, only-very-slightly-intrusive set of hooks that
> allows swap pages to, under some conditions, go to pseudo-RAM instead
> of an asynchronous disk-like device. It works today with at least
> one "backend" (Xen tmem), is shipping today in real distros, and is
> extremely easy to enable/disable via CONFIG or module... meaning
> no impact on anyone other than those who choose to benefit from it.
>
> "Extending" the existing swap API, which has largely been untouched for
> many years, seems like a significantly more complex and error-prone
> undertaking that will affect nearly all Linux users with a likely long
> bug tail. And, by the way, there is no existence proof that it
> will be useful.
>
> Seems like a no-brainer to me.
>
My issue is with the API's synchronous nature. Both RAM and more exotic
memories can be used with DMA instead of copying. A synchronous
interface gives this up.
>> Ok. For non traditional RAM uses I really think an async API is
>> needed. If the API is backed by a cpu synchronous operation is fine,
>> but once it isn't RAM, it can be all kinds of interesting things.
>>
> Well, we shall see. It may also be the case that the existing
> asynchronous swap API will work fine for some non traditional RAM;
> and it may also be the case that frontswap works fine for some
> non traditional RAM. I agree there is fertile ground for exploration
> here. But let's not allow our speculation on what may or may
> not work in the future halt forward progress of something that works
> today.
>
Let's not allow the urge to merge prevent us from doing the right thing.
>
>
>> Note that even if you do give the page to the guest, you still control
>> how it can access it, through the page tables. So for example you can
>> easily compress a guest's pages without telling it about it; whenever
>> it
>> touches them you decompress them on the fly.
>>
> Yes, at a much larger more invasive cost to the kernel. Frontswap
> and cleancache and tmem are all well-layered for a good reason.
>
No need to change the kernel at all; the hypervisor controls the page
tables.
>> Swap has no timing
>> constraints, it is asynchronous and usually to slow devices.
>>
> What I was referring to is that the existing swap code DOES NOT
> always have the ability to collect N scattered pages before
> initiating an I/O write suitable for a device (such as an SSD)
> that is optimized for writing N pages at a time. That is what
> I meant by a timing constraint. See references to page_cluster
> in the swap code (and this is for contiguous pages, not scattered).
>
I see. Given that swap-to-flash will soon be way more common than
frontswap, it needs to be solved (either in flash or in the swap code).
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, ngupta@vflare.org,
JBeulich@novell.com, chris.mason@oracle.com,
kurt.hackel@oracle.com, dave.mccracken@oracle.com,
npiggin@suse.de, akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sun, 25 Apr 2010 15:11:46 +0300 [thread overview]
Message-ID: <4BD43182.1040508@redhat.com> (raw)
In-Reply-To: <d1bb78ca-5ef6-4a8d-af79-a265f2d4339c@default>
On 04/25/2010 03:30 AM, Dan Magenheimer wrote:
>>>> I see. So why not implement this as an ordinary swap device, with a
>>>> higher priority than the disk device? this way we reuse an API and
>>>> keep
>>>> things asynchronous, instead of introducing a special purpose API.
>>>>
>>>>
>>> Because the swapping API doesn't adapt well to dynamic changes in
>>> the size and availability of the underlying "swap" device, which
>>> is very useful for swap to (bare-metal) hypervisor.
>>>
>> Can we extend it? Adding new APIs is easy, but harder to maintain in
>> the long term.
>>
> Umm... I think the difference between a "new" API and extending
> an existing one here is a choice of semantics. As designed, frontswap
> is an extremely simple, only-very-slightly-intrusive set of hooks that
> allows swap pages to, under some conditions, go to pseudo-RAM instead
> of an asynchronous disk-like device. It works today with at least
> one "backend" (Xen tmem), is shipping today in real distros, and is
> extremely easy to enable/disable via CONFIG or module... meaning
> no impact on anyone other than those who choose to benefit from it.
>
> "Extending" the existing swap API, which has largely been untouched for
> many years, seems like a significantly more complex and error-prone
> undertaking that will affect nearly all Linux users with a likely long
> bug tail. And, by the way, there is no existence proof that it
> will be useful.
>
> Seems like a no-brainer to me.
>
My issue is with the API's synchronous nature. Both RAM and more exotic
memories can be used with DMA instead of copying. A synchronous
interface gives this up.
>> Ok. For non traditional RAM uses I really think an async API is
>> needed. If the API is backed by a cpu synchronous operation is fine,
>> but once it isn't RAM, it can be all kinds of interesting things.
>>
> Well, we shall see. It may also be the case that the existing
> asynchronous swap API will work fine for some non traditional RAM;
> and it may also be the case that frontswap works fine for some
> non traditional RAM. I agree there is fertile ground for exploration
> here. But let's not allow our speculation on what may or may
> not work in the future halt forward progress of something that works
> today.
>
Let's not allow the urge to merge prevent us from doing the right thing.
>
>
>> Note that even if you do give the page to the guest, you still control
>> how it can access it, through the page tables. So for example you can
>> easily compress a guest's pages without telling it about it; whenever
>> it
>> touches them you decompress them on the fly.
>>
> Yes, at a much larger more invasive cost to the kernel. Frontswap
> and cleancache and tmem are all well-layered for a good reason.
>
No need to change the kernel at all; the hypervisor controls the page
tables.
>> Swap has no timing
>> constraints, it is asynchronous and usually to slow devices.
>>
> What I was referring to is that the existing swap code DOES NOT
> always have the ability to collect N scattered pages before
> initiating an I/O write suitable for a device (such as an SSD)
> that is optimized for writing N pages at a time. That is what
> I meant by a timing constraint. See references to page_cluster
> in the swap code (and this is for contiguous pages, not scattered).
>
I see. Given that swap-to-flash will soon be way more common than
frontswap, it needs to be solved (either in flash or in the swap code).
--
error compiling committee.c: too many arguments to function
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-25 12:12 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 13:42 Frontswap [PATCH 0/4] (was Transcendent Memory): overview Dan Magenheimer
2010-04-22 13:42 ` Dan Magenheimer
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 16:13 ` Avi Kivity
2010-04-22 16:13 ` Avi Kivity
2010-04-22 20:15 ` Dan Magenheimer
2010-04-22 20:15 ` Dan Magenheimer
2010-04-23 9:48 ` Avi Kivity
2010-04-23 9:48 ` Avi Kivity
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:57 ` Avi Kivity
2010-04-23 13:57 ` Avi Kivity
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:52 ` Avi Kivity
2010-04-23 14:52 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 16:26 ` Dan Magenheimer
2010-04-23 16:26 ` Dan Magenheimer
2010-04-24 18:25 ` Avi Kivity
2010-04-24 18:25 ` Avi Kivity
[not found] ` <1c02a94a-a6aa-4cbb-a2e6-9d4647760e91@default4BD43033.7090706@redhat.com>
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 12:06 ` Avi Kivity
2010-04-25 12:06 ` Avi Kivity
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:18 ` Avi Kivity
2010-04-25 13:18 ` Avi Kivity
2010-04-28 5:55 ` Pavel Machek
2010-04-28 5:55 ` Pavel Machek
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 18:59 ` Avi Kivity
2010-04-29 18:59 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-30 1:45 ` Dave Hansen
2010-04-30 1:45 ` Dave Hansen
2010-04-30 7:13 ` Avi Kivity
2010-04-30 7:13 ` Avi Kivity
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 16:08 ` Dave Hansen
2010-04-30 16:08 ` Dave Hansen
2010-05-10 16:05 ` Martin Schwidefsky
2010-05-10 16:05 ` Martin Schwidefsky
2010-04-30 16:16 ` Avi Kivity
2010-04-30 16:16 ` Avi Kivity
[not found] ` <4BDB18CE.2090608@goop.org4BDB2069.4000507@redhat.com>
[not found] ` <3a62a058-7976-48d7-acd2-8c6a8312f10f@default20100502071059.GF1790@ucw.cz>
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 17:10 ` Dave Hansen
2010-04-30 17:10 ` Dave Hansen
2010-04-30 18:08 ` Avi Kivity
2010-04-30 18:08 ` Avi Kivity
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-05-01 8:28 ` Avi Kivity
2010-05-01 8:28 ` Avi Kivity
2010-05-01 17:10 ` Dan Magenheimer
2010-05-01 17:10 ` Dan Magenheimer
2010-05-02 7:11 ` Pavel Machek
2010-05-02 7:11 ` Pavel Machek
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 20:06 ` Pavel Machek
2010-05-02 20:06 ` Pavel Machek
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:48 ` Avi Kivity
2010-05-02 16:48 ` Avi Kivity
2010-05-02 17:22 ` Dan Magenheimer
2010-05-02 17:22 ` Dan Magenheimer
2010-05-03 9:39 ` Avi Kivity
2010-05-03 9:39 ` Avi Kivity
2010-05-03 14:59 ` Dan Magenheimer
2010-05-03 14:59 ` Dan Magenheimer
2010-05-02 15:35 ` Avi Kivity
2010-05-02 15:35 ` Avi Kivity
2010-05-02 17:06 ` Dan Magenheimer
2010-05-02 17:06 ` Dan Magenheimer
2010-05-03 8:46 ` Avi Kivity
2010-05-03 8:46 ` Avi Kivity
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 19:32 ` Pavel Machek
2010-05-03 19:32 ` Pavel Machek
2010-04-30 16:04 ` Dave Hansen
2010-04-30 16:04 ` Dave Hansen
2010-04-23 15:56 ` Dan Magenheimer
2010-04-23 15:56 ` Dan Magenheimer
2010-04-24 18:22 ` Avi Kivity
2010-04-24 18:22 ` Avi Kivity
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 12:11 ` Avi Kivity [this message]
2010-04-25 12:11 ` Avi Kivity
[not found] ` <c5062f3a-3232-4b21-b032-2ee1f2485ff0@default4BD44E74.2020506@redhat.com>
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 14:15 ` Avi Kivity
2010-04-25 14:15 ` Avi Kivity
2010-04-25 15:29 ` Dan Magenheimer
2010-04-25 15:29 ` Dan Magenheimer
2010-04-26 6:01 ` Avi Kivity
2010-04-26 6:01 ` Avi Kivity
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 13:48 ` Avi Kivity
2010-04-26 13:48 ` Avi Kivity
2010-04-27 12:56 ` Pavel Machek
2010-04-27 12:56 ` Pavel Machek
2010-04-27 14:32 ` Dan Magenheimer
2010-04-27 14:32 ` Dan Magenheimer
2010-04-29 13:02 ` Pavel Machek
2010-04-29 13:02 ` Pavel Machek
2010-04-27 11:52 ` Valdis.Kletnieks
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 12:55 ` Pavel Machek
2010-04-27 12:55 ` Pavel Machek
2010-04-27 14:43 ` Nitin Gupta
2010-04-27 14:43 ` Nitin Gupta
2010-04-29 13:04 ` Pavel Machek
2010-04-29 13:04 ` Pavel Machek
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 18:27 ` Avi Kivity
2010-04-24 18:27 ` Avi Kivity
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 12:16 ` Avi Kivity
2010-04-25 12:16 ` Avi Kivity
2010-04-25 16:05 ` Nitin Gupta
2010-04-25 16:05 ` Nitin Gupta
2010-04-26 6:06 ` Avi Kivity
2010-04-26 6:06 ` Avi Kivity
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 13:43 ` Avi Kivity
2010-04-26 13:43 ` Avi Kivity
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 9:21 ` Avi Kivity
2010-04-27 9:21 ` Avi Kivity
2010-04-26 13:47 ` Nitin Gupta
2010-04-26 13:47 ` Nitin Gupta
2010-04-23 16:35 ` Jiahua
2010-04-23 16:35 ` Jiahua
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD43182.1040508@redhat.com \
--to=avi@redhat.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave.mccracken@oracle.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jeremy@goop.org \
--cc=kurt.hackel@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.