From: Andrea Arcangeli <aarcange@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>,
Cyclonus J <cyclonusj@gmail.com>,
Sasha Levin <levinsasha928@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Konrad Wilk <konrad.wilk@oracle.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
Seth Jennings <sjenning@linux.vnet.ibm.com>,
ngupta@vflare.org, Chris Mason <chris.mason@oracle.com>,
JBeulich@novell.com, Dave Hansen <dave@linux.vnet.ibm.com>,
Jonathan Corbet <corbet@lwn.net>
Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window)
Date: Mon, 31 Oct 2011 23:37:17 +0100 [thread overview]
Message-ID: <20111031223717.GI3466@redhat.com> (raw)
In-Reply-To: <60592afd-97aa-4eaf-b86b-f6695d31c7f1@default>
On Mon, Oct 31, 2011 at 01:58:39PM -0700, Dan Magenheimer wrote:
> Hmmm... not sure I understand this one. It IS copy-based
> so is not zerocopy; the page of data is actually moving out
copy-based is my main problem, being synchronous is no big deal I
agree.
I mean, I don't see why you have to make one copy before you start
compressing and then you write to disk the output of the compression
algorithm. To me it looks like this API forces on zcache one more copy
than necessary.
I can't see why this copy is necessary and why zcache isn't working on
"struct page" on core kernel structures instead of moving the memory
off to a memory object invisible to the core VM.
> TRUE. Tell me again why a vmexit/vmenter per 4K page is
> "impossible"? Again you are assuming (1) the CPU had some
It's sure not impossible, it's just impossible we want it as it'd be
too slow.
> real work to do instead and (2) that vmexit/vmenter is horribly
Sure the CPU has another 1000 VM to schedule. This is like saying
virtio-blk isn't needed on desktop virt becauase the desktop isn't
doing much I/O. Absurd argument, there are another 1000 desktops doing
I/O at the same time of course.
> slow. Even if vmexit/vmenter is thousands of cycles, it is still
> orders of magnitude faster than a disk access. And vmexit/vmenter
I fully agree tmem is faster for Xen than no tmem. That's not the
point, we don't need such an articulate hack hiding pages from the
guest OS in order to share pagecache, our hypervisor is just a bit
more powerful and has a function called file_read_actor that does what
your tmem copy does...
> is about the same order of magnitude as page copy, and much
> faster than compression/decompression, both of which still
> result in a nice win.
Saying it's a small overhead, is not like saying it is _needed_. Why
not add a udelay(1) in it too? Sure it won't be noticeable.
> You are also assuming that frontswap puts/gets are highly
> frequent. By definition they are not, because they are
> replacing single-page disk reads/writes due to swapping.
They'll be as frequent as the highmem bounce buffers...
> That said, the API/ABI is very extensible, so if it were
> proven that batching was sufficiently valuable, it could
> be added later... but I don't see it as a showstopper.
> Really do you?
That's fine with me... but like ->writepages it'll take ages for the
fs to switch from writepage to writepages. Considering this is a new
API I don't think it's unreasonable to ask at least it to handle
immediately zerocopy behavior. So showing the userland mapping to the
tmem layer so it can avoid the copy and read from the userland
address. Xen will badly choke if ever tries to do that, but zcache
should be ok with that.
Now there may be algorithms where the page must be stable, but others
will be perfectly fine even if the page is changing under the
compression, and in that case the page won't be discarded and it'll be
marked dirty again. So even if a wrong data goes on disk, we'll
rewrite later. I see no reason why there has always to be a copy
before starting any compression/encryption as long as the algorithm
will not crash its input data isn't changing under it.
The ideal API would be to send down page pointers (and handling
compound pages too), not to copy. Maybe with a flag where you can also
specify offsets so you can send down partial pages too down to a byte
granularity. The "copy input data before anything else can happen"
looks flawed to me. It is not flawed for Xen because Xen has no
knowledge of the guest "struct page" but her I'm talking about the
not-virt usages.
> So, please, all the other parts necessary for tmem are
> already in-tree, why all the resistance about frontswap?
Well my comments are generic not specific to frontswap.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-10-31 22:37 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-27 18:52 [GIT PULL] mm: frontswap (for 3.2 window) Dan Magenheimer
[not found] ` <alpine.DEB.2.00.1110271318220.7639@chino.kir.corp.google.com20111027211157.GA1199@infradead.org>
2011-10-27 19:30 ` Kurt Hackel
2011-10-27 20:18 ` David Rientjes
2011-10-27 21:11 ` Christoph Hellwig
2011-10-27 21:49 ` Dan Magenheimer
2011-10-27 21:52 ` Christoph Hellwig
2011-10-27 22:21 ` Dan Magenheimer
2011-10-28 7:12 ` Sasha Levin
[not found] ` <CAOzbF4fnD=CGR-nizZoBxmFSuAjFC3uAHf3wDj5RLneJvJhrOQ@mail.gmail.comCAOJsxLGOTw7rtFnqeHvzFxifA0QgPVDHZzrEo=-uB2Gkrvp=JQ@mail.gmail.com>
[not found] ` <552d2067-474d-4aef-a9a4-89e5fd8ef84f@default20111031181651.GF3466@redhat.com>
[not found] ` <60592afd-97aa-4eaf-b86b-f6695d31c7f1@default20111031223717.GI3466@redhat.com>
[not found] ` <1b2e4f74-7058-4712-85a7-84198723e3ee@default20111101012017.GJ3466@redhat.com>
[not found] ` <6a9db6d9-6f13-4855-b026-ba668c29ddfa@default20111101180702.GL3466@redhat.com>
[not found] ` <b8a0ca71-a31b-488a-9a92-2502d4a6e9bf@default20111102013122.GA18879@redhat.com>
2011-10-28 7:30 ` Cyclonus J
2011-10-28 14:26 ` Pekka Enberg
2011-10-28 15:21 ` Dan Magenheimer
[not found] ` <CAOJsxLEE-qf9me1SAZLFiEVhHVnDh7BDrSx1+abe9R4mfkhD=g@mail.gmail.com20111028163053.GC1319@redhat.com>
2011-10-28 15:36 ` Pekka Enberg
2011-10-28 16:30 ` Johannes Weiner
2011-10-28 17:01 ` Pekka Enberg
2011-10-28 17:07 ` Dan Magenheimer
2011-10-28 18:28 ` John Stoffel
2011-10-28 20:19 ` Dan Magenheimer
2011-10-28 20:52 ` John Stoffel
2011-10-30 19:18 ` Dan Magenheimer
2011-10-30 20:06 ` Dave Hansen
2011-10-30 21:50 ` Dan Magenheimer
2011-11-02 19:45 ` Rik van Riel
2011-11-02 20:45 ` Dan Magenheimer
2011-11-06 22:32 ` Valdis.Kletnieks
2011-11-08 12:15 ` Ed Tomlinson
2011-10-31 8:12 ` James Bottomley
2011-10-31 15:39 ` Dan Magenheimer
2011-11-01 10:13 ` James Bottomley
2011-11-01 18:10 ` Dan Magenheimer
2011-11-01 18:48 ` Dave Hansen
2011-11-01 21:32 ` Dan Magenheimer
2011-11-02 7:44 ` James Bottomley
2011-11-02 19:39 ` Dan Magenheimer
2011-10-31 18:44 ` Andrea Arcangeli
2011-10-30 21:47 ` Johannes Weiner
2011-10-30 23:19 ` Dan Magenheimer
2011-10-31 18:34 ` Andrea Arcangeli
2011-10-31 21:45 ` Dan Magenheimer
2011-10-28 16:37 ` Dan Magenheimer
2011-10-28 16:59 ` Pekka Enberg
2011-10-28 17:20 ` Dan Magenheimer
2011-10-31 18:16 ` Andrea Arcangeli
2011-10-31 20:58 ` Dan Magenheimer
2011-10-31 22:37 ` Andrea Arcangeli [this message]
2011-10-31 23:36 ` Dan Magenheimer
2011-11-01 1:20 ` Andrea Arcangeli
2011-11-01 16:41 ` Dan Magenheimer
2011-11-01 18:07 ` Andrea Arcangeli
2011-11-01 21:00 ` Dan Magenheimer
2011-11-02 1:31 ` Andrea Arcangeli
2011-11-02 19:06 ` Dan Magenheimer
2011-11-03 0:32 ` Andrea Arcangeli
2011-11-03 22:29 ` Dan Magenheimer
2011-11-02 20:51 ` Rik van Riel
2011-11-02 21:14 ` Dan Magenheimer
2011-11-15 16:29 ` Rik van Riel
2011-11-15 17:33 ` Jeremy Fitzhardinge
2011-11-16 14:49 ` Konrad Rzeszutek Wilk
2011-11-01 10:16 ` James Bottomley
2011-11-01 18:21 ` Dan Magenheimer
2011-11-02 8:14 ` James Bottomley
2011-11-02 20:08 ` Dan Magenheimer
2011-11-03 10:30 ` Theodore Tso
2011-11-03 14:59 ` Dan Magenheimer
2011-11-02 15:44 ` Avi Kivity
2011-11-02 16:02 ` Andrea Arcangeli
2011-11-02 16:13 ` Avi Kivity
2011-11-02 20:27 ` Dan Magenheimer
2011-11-02 20:19 ` Dan Magenheimer
2011-10-27 21:44 ` Avi Miller
2011-10-27 22:33 ` Brian King
2011-10-28 5:17 ` Nitin Gupta
2011-10-29 13:43 ` Ed Tomlinson
2011-10-31 8:13 ` KAMEZAWA Hiroyuki
2011-10-31 16:38 ` Dan Magenheimer
2011-11-01 0:50 ` KAMEZAWA Hiroyuki
2011-11-01 15:25 ` Dan Magenheimer
2011-11-01 21:43 ` Andrew Morton
2011-11-01 22:25 ` Dan Magenheimer
2011-11-02 21:03 ` Rik van Riel
2011-11-02 21:42 ` Dan Magenheimer
2011-11-02 1:14 ` KAMEZAWA Hiroyuki
2011-11-02 15:12 ` Dan Magenheimer
2011-11-04 4:19 ` KAMEZAWA Hiroyuki
2011-11-03 16:49 ` Jan Beulich
2011-11-04 0:54 ` Andrew Morton
2011-11-04 8:49 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111031223717.GI3466@redhat.com \
--to=aarcange@redhat.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=corbet@lwn.net \
--cc=cyclonusj@gmail.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hch@infradead.org \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=sjenning@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).