From: Nitin Gupta <ngupta@vflare.org>
To: Avi Kivity <avi@redhat.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, JBeulich@novell.com,
chris.mason@oracle.com, kurt.hackel@oracle.com,
dave.mccracken@oracle.com, npiggin@suse.de,
akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sun, 25 Apr 2010 21:35:34 +0530 [thread overview]
Message-ID: <4BD4684E.9040802@vflare.org> (raw)
In-Reply-To: <4BD4329A.9010509@redhat.com>
On 04/25/2010 05:46 PM, Avi Kivity wrote:
> On 04/25/2010 06:11 AM, Nitin Gupta wrote:
>> On 04/24/2010 11:57 PM, Avi Kivity wrote:
>>
>>> On 04/24/2010 04:49 AM, Nitin Gupta wrote:
>>>
>>>>
>>>>> I see. So why not implement this as an ordinary swap device, with a
>>>>> higher priority than the disk device? this way we reuse an API and
>>>>> keep
>>>>> things asynchronous, instead of introducing a special purpose API.
>>>>>
>>>>>
>>>>>
>>>> ramzswap is exactly this: an ordinary swap device which stores every
>>>> page
>>>> in (compressed) memory and its enabled as highest priority swap.
>>>> Currently,
>>>> it stores these compressed chunks in guest memory itself but it is not
>>>> very
>>>> difficult to send these chunks out to host/hypervisor using virtio.
>>>>
>>>> However, it suffers from unnecessary block I/O layer overhead and
>>>> requires
>>>> weird hooks in swap code, say to get notification when a swap slot is
>>>> freed.
>>>>
>>>>
>>> Isn't that TRIM?
>>>
>> No: trim or discard is not useful. The problem is that we require a
>> callback
>> _as soon as_ a page (swap slot) is freed. Otherwise, stale data
>> quickly accumulates
>> in memory defeating the whole purpose of in-memory compressed swap
>> devices (like ramzswap).
>>
>
> Doesn't flash have similar requirements? The earlier you discard, the
> likelier you are to reuse an erase block (or reduce the amount of copying).
>
No. We do not want to issue discard for every page as soon as it is freed.
I'm not flash expert but I guess issuing erase is just too expensive to be
issued so frequently. OTOH, ramzswap needs a callback for every page and as
soon as it is freed.
>> Increasing the frequency of discards is also not an option:
>> - Creating discard bio requests themselves need memory and these
>> swap devices
>> come into picture only under low memory conditions.
>>
>
> That's fine, swap works under low memory conditions by using reserves.
>
Ok, but still all this bio allocation and block layer overhead seems
unnecessary and is easily avoidable. I think frontswap code needs
clean up but at least it avoids all this bio overhead.
>> - We need to regularly scan swap_map to issue these discards.
>> Increasing discard
>> frequency also means more frequent scanning (which will still not be
>> fast enough
>> for ramzswap needs).
>>
>
> How does frontswap do this? Does it maintain its own data structures?
>
frontswap simply calls frontswap_flush_page() in swap_entry_free() i.e. as
soon as a swap slot is freed. No bio allocation etc.
>>> Maybe we should optimize these overheads instead. Swap used to always
>>> be to slow devices, but swap-to-flash has the potential to make swap act
>>> like an extension of RAM.
>>>
>>>
>> Spending lot of effort optimizing an overhead which can be completely
>> avoided
>> is probably not worth it.
>>
>
> I'm not sure. Swap-to-flash will soon be everywhere. If it's slow,
> people will feel it a lot more than ramzswap slowness.
>
Optimizing swap-to-flash is surely desirable but this problem is separate
from ramzswap or frontswap optimization. For the latter, I think dealing
with bio's, going through block layer is plain overhead.
>> Also, I think the choice of a synchronous style API for frontswap and
>> cleancache
>> is justified as they want to send pages to host *RAM*. If you want to
>> use other
>> devices like SSDs, then these should be just added as another swap
>> device as
>> we do currently -- these should not be used as frontswap storage
>> directly.
>>
>
> Even for copying to RAM an async API is wanted, so you can dma it
> instead of copying.
>
Maybe incremental development is better? Stabilize and refine existing
code and gradually move to async API, if required in future?
Thanks,
Nitin
WARNING: multiple messages have this Message-ID (diff)
From: Nitin Gupta <ngupta@vflare.org>
To: Avi Kivity <avi@redhat.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jeremy@goop.org, hugh.dickins@tiscali.co.uk, JBeulich@novell.com,
chris.mason@oracle.com, kurt.hackel@oracle.com,
dave.mccracken@oracle.com, npiggin@suse.de,
akpm@linux-foundation.org, riel@redhat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
Date: Sun, 25 Apr 2010 21:35:34 +0530 [thread overview]
Message-ID: <4BD4684E.9040802@vflare.org> (raw)
In-Reply-To: <4BD4329A.9010509@redhat.com>
On 04/25/2010 05:46 PM, Avi Kivity wrote:
> On 04/25/2010 06:11 AM, Nitin Gupta wrote:
>> On 04/24/2010 11:57 PM, Avi Kivity wrote:
>>
>>> On 04/24/2010 04:49 AM, Nitin Gupta wrote:
>>>
>>>>
>>>>> I see. So why not implement this as an ordinary swap device, with a
>>>>> higher priority than the disk device? this way we reuse an API and
>>>>> keep
>>>>> things asynchronous, instead of introducing a special purpose API.
>>>>>
>>>>>
>>>>>
>>>> ramzswap is exactly this: an ordinary swap device which stores every
>>>> page
>>>> in (compressed) memory and its enabled as highest priority swap.
>>>> Currently,
>>>> it stores these compressed chunks in guest memory itself but it is not
>>>> very
>>>> difficult to send these chunks out to host/hypervisor using virtio.
>>>>
>>>> However, it suffers from unnecessary block I/O layer overhead and
>>>> requires
>>>> weird hooks in swap code, say to get notification when a swap slot is
>>>> freed.
>>>>
>>>>
>>> Isn't that TRIM?
>>>
>> No: trim or discard is not useful. The problem is that we require a
>> callback
>> _as soon as_ a page (swap slot) is freed. Otherwise, stale data
>> quickly accumulates
>> in memory defeating the whole purpose of in-memory compressed swap
>> devices (like ramzswap).
>>
>
> Doesn't flash have similar requirements? The earlier you discard, the
> likelier you are to reuse an erase block (or reduce the amount of copying).
>
No. We do not want to issue discard for every page as soon as it is freed.
I'm not flash expert but I guess issuing erase is just too expensive to be
issued so frequently. OTOH, ramzswap needs a callback for every page and as
soon as it is freed.
>> Increasing the frequency of discards is also not an option:
>> - Creating discard bio requests themselves need memory and these
>> swap devices
>> come into picture only under low memory conditions.
>>
>
> That's fine, swap works under low memory conditions by using reserves.
>
Ok, but still all this bio allocation and block layer overhead seems
unnecessary and is easily avoidable. I think frontswap code needs
clean up but at least it avoids all this bio overhead.
>> - We need to regularly scan swap_map to issue these discards.
>> Increasing discard
>> frequency also means more frequent scanning (which will still not be
>> fast enough
>> for ramzswap needs).
>>
>
> How does frontswap do this? Does it maintain its own data structures?
>
frontswap simply calls frontswap_flush_page() in swap_entry_free() i.e. as
soon as a swap slot is freed. No bio allocation etc.
>>> Maybe we should optimize these overheads instead. Swap used to always
>>> be to slow devices, but swap-to-flash has the potential to make swap act
>>> like an extension of RAM.
>>>
>>>
>> Spending lot of effort optimizing an overhead which can be completely
>> avoided
>> is probably not worth it.
>>
>
> I'm not sure. Swap-to-flash will soon be everywhere. If it's slow,
> people will feel it a lot more than ramzswap slowness.
>
Optimizing swap-to-flash is surely desirable but this problem is separate
from ramzswap or frontswap optimization. For the latter, I think dealing
with bio's, going through block layer is plain overhead.
>> Also, I think the choice of a synchronous style API for frontswap and
>> cleancache
>> is justified as they want to send pages to host *RAM*. If you want to
>> use other
>> devices like SSDs, then these should be just added as another swap
>> device as
>> we do currently -- these should not be used as frontswap storage
>> directly.
>>
>
> Even for copying to RAM an async API is wanted, so you can dma it
> instead of copying.
>
Maybe incremental development is better? Stabilize and refine existing
code and gradually move to async API, if required in future?
Thanks,
Nitin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-25 16:08 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 13:42 Frontswap [PATCH 0/4] (was Transcendent Memory): overview Dan Magenheimer
2010-04-22 13:42 ` Dan Magenheimer
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:28 ` Avi Kivity
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 15:48 ` Dan Magenheimer
2010-04-22 16:13 ` Avi Kivity
2010-04-22 16:13 ` Avi Kivity
2010-04-22 20:15 ` Dan Magenheimer
2010-04-22 20:15 ` Dan Magenheimer
2010-04-23 9:48 ` Avi Kivity
2010-04-23 9:48 ` Avi Kivity
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:47 ` Dan Magenheimer
2010-04-23 13:57 ` Avi Kivity
2010-04-23 13:57 ` Avi Kivity
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:43 ` Dan Magenheimer
2010-04-23 14:52 ` Avi Kivity
2010-04-23 14:52 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 15:00 ` Avi Kivity
2010-04-23 16:26 ` Dan Magenheimer
2010-04-23 16:26 ` Dan Magenheimer
2010-04-24 18:25 ` Avi Kivity
2010-04-24 18:25 ` Avi Kivity
[not found] ` <1c02a94a-a6aa-4cbb-a2e6-9d4647760e91@default4BD43033.7090706@redhat.com>
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 0:41 ` Dan Magenheimer
2010-04-25 12:06 ` Avi Kivity
2010-04-25 12:06 ` Avi Kivity
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:12 ` Dan Magenheimer
2010-04-25 13:18 ` Avi Kivity
2010-04-25 13:18 ` Avi Kivity
2010-04-28 5:55 ` Pavel Machek
2010-04-28 5:55 ` Pavel Machek
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 14:42 ` Dan Magenheimer
2010-04-29 18:59 ` Avi Kivity
2010-04-29 18:59 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 19:01 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-29 18:53 ` Avi Kivity
2010-04-30 1:45 ` Dave Hansen
2010-04-30 1:45 ` Dave Hansen
2010-04-30 7:13 ` Avi Kivity
2010-04-30 7:13 ` Avi Kivity
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 15:59 ` Dan Magenheimer
2010-04-30 16:08 ` Dave Hansen
2010-04-30 16:08 ` Dave Hansen
2010-05-10 16:05 ` Martin Schwidefsky
2010-05-10 16:05 ` Martin Schwidefsky
2010-04-30 16:16 ` Avi Kivity
2010-04-30 16:16 ` Avi Kivity
[not found] ` <4BDB18CE.2090608@goop.org4BDB2069.4000507@redhat.com>
[not found] ` <3a62a058-7976-48d7-acd2-8c6a8312f10f@default20100502071059.GF1790@ucw.cz>
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 16:43 ` Dan Magenheimer
2010-04-30 17:10 ` Dave Hansen
2010-04-30 17:10 ` Dave Hansen
2010-04-30 18:08 ` Avi Kivity
2010-04-30 18:08 ` Avi Kivity
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 17:52 ` Jeremy Fitzhardinge
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:24 ` Avi Kivity
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-04-30 18:59 ` Jeremy Fitzhardinge
2010-05-01 8:28 ` Avi Kivity
2010-05-01 8:28 ` Avi Kivity
2010-05-01 17:10 ` Dan Magenheimer
2010-05-01 17:10 ` Dan Magenheimer
2010-05-02 7:11 ` Pavel Machek
2010-05-02 7:11 ` Pavel Machek
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 15:05 ` Dan Magenheimer
2010-05-02 20:06 ` Pavel Machek
2010-05-02 20:06 ` Pavel Machek
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 21:05 ` Dan Magenheimer
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 7:57 ` Nitin Gupta
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:06 ` Dan Magenheimer
2010-05-02 16:48 ` Avi Kivity
2010-05-02 16:48 ` Avi Kivity
2010-05-02 17:22 ` Dan Magenheimer
2010-05-02 17:22 ` Dan Magenheimer
2010-05-03 9:39 ` Avi Kivity
2010-05-03 9:39 ` Avi Kivity
2010-05-03 14:59 ` Dan Magenheimer
2010-05-03 14:59 ` Dan Magenheimer
2010-05-02 15:35 ` Avi Kivity
2010-05-02 15:35 ` Avi Kivity
2010-05-02 17:06 ` Dan Magenheimer
2010-05-02 17:06 ` Dan Magenheimer
2010-05-03 8:46 ` Avi Kivity
2010-05-03 8:46 ` Avi Kivity
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 16:01 ` Dan Magenheimer
2010-05-03 19:32 ` Pavel Machek
2010-05-03 19:32 ` Pavel Machek
2010-04-30 16:04 ` Dave Hansen
2010-04-30 16:04 ` Dave Hansen
2010-04-23 15:56 ` Dan Magenheimer
2010-04-23 15:56 ` Dan Magenheimer
2010-04-24 18:22 ` Avi Kivity
2010-04-24 18:22 ` Avi Kivity
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 0:30 ` Dan Magenheimer
2010-04-25 12:11 ` Avi Kivity
2010-04-25 12:11 ` Avi Kivity
[not found] ` <c5062f3a-3232-4b21-b032-2ee1f2485ff0@default4BD44E74.2020506@redhat.com>
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 13:37 ` Dan Magenheimer
2010-04-25 14:15 ` Avi Kivity
2010-04-25 14:15 ` Avi Kivity
2010-04-25 15:29 ` Dan Magenheimer
2010-04-25 15:29 ` Dan Magenheimer
2010-04-26 6:01 ` Avi Kivity
2010-04-26 6:01 ` Avi Kivity
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 12:45 ` Dan Magenheimer
2010-04-26 13:48 ` Avi Kivity
2010-04-26 13:48 ` Avi Kivity
2010-04-27 12:56 ` Pavel Machek
2010-04-27 12:56 ` Pavel Machek
2010-04-27 14:32 ` Dan Magenheimer
2010-04-27 14:32 ` Dan Magenheimer
2010-04-29 13:02 ` Pavel Machek
2010-04-29 13:02 ` Pavel Machek
2010-04-27 11:52 ` Valdis.Kletnieks
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 0:49 ` Jeremy Fitzhardinge
2010-04-27 12:55 ` Pavel Machek
2010-04-27 12:55 ` Pavel Machek
2010-04-27 14:43 ` Nitin Gupta
2010-04-27 14:43 ` Nitin Gupta
2010-04-29 13:04 ` Pavel Machek
2010-04-29 13:04 ` Pavel Machek
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 1:49 ` Nitin Gupta
2010-04-24 18:27 ` Avi Kivity
2010-04-24 18:27 ` Avi Kivity
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 3:11 ` Nitin Gupta
2010-04-25 12:16 ` Avi Kivity
2010-04-25 12:16 ` Avi Kivity
2010-04-25 16:05 ` Nitin Gupta [this message]
2010-04-25 16:05 ` Nitin Gupta
2010-04-26 6:06 ` Avi Kivity
2010-04-26 6:06 ` Avi Kivity
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 12:50 ` Dan Magenheimer
2010-04-26 13:43 ` Avi Kivity
2010-04-26 13:43 ` Avi Kivity
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 8:29 ` Dan Magenheimer
2010-04-27 9:21 ` Avi Kivity
2010-04-27 9:21 ` Avi Kivity
2010-04-26 13:47 ` Nitin Gupta
2010-04-26 13:47 ` Nitin Gupta
2010-04-23 16:35 ` Jiahua
2010-04-23 16:35 ` Jiahua
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD4684E.9040802@vflare.org \
--to=ngupta@vflare.org \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave.mccracken@oracle.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jeremy@goop.org \
--cc=kurt.hackel@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.