From: Andrea Arcangeli <aarcange@redhat.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>,
Cyclonus J <cyclonusj@gmail.com>,
Sasha Levin <levinsasha928@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Konrad Wilk <konrad.wilk@oracle.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
Seth Jennings <sjenning@linux.vnet.ibm.com>,
ngupta@vflare.org, Chris Mason <chris.mason@oracle.com>,
JBeulich@novell.com, Dave Hansen <dave@linux.vnet.ibm.com>,
Jonathan Corbet <corbet@lwn.net>
Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window)
Date: Mon, 31 Oct 2011 19:16:51 +0100 [thread overview]
Message-ID: <20111031181651.GF3466@redhat.com> (raw)
In-Reply-To: <552d2067-474d-4aef-a9a4-89e5fd8ef84f@default>
On Fri, Oct 28, 2011 at 08:21:31AM -0700, Dan Magenheimer wrote:
> real users and real distros and real products waiting, so if there
> are any real issues, let's get them resolved.
We already told you the real issues there are and you did nothing so
far to address those, so much was built on top of a flawed API that I
guess an heartquake of massive scale has to come in to actually
convince Xen to change any of the huge amount of code built on the
flawed API.
I don't know the exact Xen details (it's possible Xen design doesn't
allow these below 4 issues to be fixed, I've no idea) but for all
other non-virt usages (compressed-swap/compressed-pagecache, ramster)
I doubt it is impossible to change the design of the tmem API to
address at least one of those basic huge troubles that such an API
imposes:
1) 4k page limit (no way to handle hugepages)
Ok swapcache and pagecache are always 4k, but that may change. Plus
it's generally flawed these days to add a new API people will build
code on that can't handle hugepages, at least hugetlbfs should be
handled. And especially considering it was born for virt, in virt
space we only work with hugepages.
2) synchronous
3) not zerocopy, requires one bounce buffer for every get and one
bounce buffer again for every put (like highmem I/O with 32bit pci)
In my view point 3 is definitely fixable for swapcache compression
and pagecache compression, there's no way we can accept a copy before
starting compressing the data, the source of the compression
algorithm must be the _userland_ page but instead you copy first, and
compress on the copy destination, correct me if I'm wrong.
4) can't handle batched requests
Requires one vmexit for each 4k page accessed if KVM hypervisor wants
to access tmem, it's impossible we want to use this in KVM, at most
we could consider exiting every 2M page, impossible to vmexit every
4k or performance is destroyed and we'd run as slow as no-EPT/NPT.
Address these 4 points (or at least the ones that are solvable) and
it'll become appealing. Or at least try to explain why it's impossible
to solve all these 4 points to convince us this API is the best we can
get for the non-virt usages (let's ignore Xen/KVM for the sake of this
discussion, as Xen may have legitimate reasons for why those 4 above
points are impossible to fix).
At the moment to me it still looks a legacy-compatibility API to make
life easier to Xen users that uses a limited API (at least it's
simpler I'd agree on it being simpler this way) to share cache across
different guests and tries to impose those above 4 limits (and
horrendous performance in accessing tmem from Xen Guest but still
faster than I/O isn't it? :) even to the non-virt usages.
Even frontswap, there is no way we can accept to do synchronous bounce
buffers for every single 4k page that is going to hit swap. That's
worse than HIGHMEM 32bit... Obviously you must be mlocking all Oracle
db memory so you won't hit that bounce buffering ever with
Oracle. Also note, historically there's nobody that hated bounce
buffers more than Oracle (at least I remember the highmem issues with
pci32 cards :). Also Oracle was the biggest user of hugetlbfs.
So it sounds weird that you like this API forces bounce buffering CPU
cache-destroying and 4k page units, for everything that passes through
it.
If I'm wrong please correct me, I hadn't lots of time to check
code. But we already raised these points before without much answer.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-10-31 18:17 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-27 18:52 [GIT PULL] mm: frontswap (for 3.2 window) Dan Magenheimer
[not found] ` <alpine.DEB.2.00.1110271318220.7639@chino.kir.corp.google.com20111027211157.GA1199@infradead.org>
2011-10-27 19:30 ` Kurt Hackel
2011-10-27 20:18 ` David Rientjes
2011-10-27 21:11 ` Christoph Hellwig
2011-10-27 21:49 ` Dan Magenheimer
2011-10-27 21:52 ` Christoph Hellwig
2011-10-27 22:21 ` Dan Magenheimer
2011-10-28 7:12 ` Sasha Levin
[not found] ` <CAOzbF4fnD=CGR-nizZoBxmFSuAjFC3uAHf3wDj5RLneJvJhrOQ@mail.gmail.comCAOJsxLGOTw7rtFnqeHvzFxifA0QgPVDHZzrEo=-uB2Gkrvp=JQ@mail.gmail.com>
[not found] ` <552d2067-474d-4aef-a9a4-89e5fd8ef84f@default20111031181651.GF3466@redhat.com>
[not found] ` <60592afd-97aa-4eaf-b86b-f6695d31c7f1@default20111031223717.GI3466@redhat.com>
[not found] ` <1b2e4f74-7058-4712-85a7-84198723e3ee@default20111101012017.GJ3466@redhat.com>
[not found] ` <6a9db6d9-6f13-4855-b026-ba668c29ddfa@default20111101180702.GL3466@redhat.com>
[not found] ` <b8a0ca71-a31b-488a-9a92-2502d4a6e9bf@default20111102013122.GA18879@redhat.com>
2011-10-28 7:30 ` Cyclonus J
2011-10-28 14:26 ` Pekka Enberg
2011-10-28 15:21 ` Dan Magenheimer
[not found] ` <CAOJsxLEE-qf9me1SAZLFiEVhHVnDh7BDrSx1+abe9R4mfkhD=g@mail.gmail.com20111028163053.GC1319@redhat.com>
2011-10-28 15:36 ` Pekka Enberg
2011-10-28 16:30 ` Johannes Weiner
2011-10-28 17:01 ` Pekka Enberg
2011-10-28 17:07 ` Dan Magenheimer
2011-10-28 18:28 ` John Stoffel
2011-10-28 20:19 ` Dan Magenheimer
2011-10-28 20:52 ` John Stoffel
2011-10-30 19:18 ` Dan Magenheimer
2011-10-30 20:06 ` Dave Hansen
2011-10-30 21:50 ` Dan Magenheimer
2011-11-02 19:45 ` Rik van Riel
2011-11-02 20:45 ` Dan Magenheimer
2011-11-06 22:32 ` Valdis.Kletnieks
2011-11-08 12:15 ` Ed Tomlinson
2011-10-31 8:12 ` James Bottomley
2011-10-31 15:39 ` Dan Magenheimer
2011-11-01 10:13 ` James Bottomley
2011-11-01 18:10 ` Dan Magenheimer
2011-11-01 18:48 ` Dave Hansen
2011-11-01 21:32 ` Dan Magenheimer
2011-11-02 7:44 ` James Bottomley
2011-11-02 19:39 ` Dan Magenheimer
2011-10-31 18:44 ` Andrea Arcangeli
2011-10-30 21:47 ` Johannes Weiner
2011-10-30 23:19 ` Dan Magenheimer
2011-10-31 18:34 ` Andrea Arcangeli
2011-10-31 21:45 ` Dan Magenheimer
2011-10-28 16:37 ` Dan Magenheimer
2011-10-28 16:59 ` Pekka Enberg
2011-10-28 17:20 ` Dan Magenheimer
2011-10-31 18:16 ` Andrea Arcangeli [this message]
2011-10-31 20:58 ` Dan Magenheimer
2011-10-31 22:37 ` Andrea Arcangeli
2011-10-31 23:36 ` Dan Magenheimer
2011-11-01 1:20 ` Andrea Arcangeli
2011-11-01 16:41 ` Dan Magenheimer
2011-11-01 18:07 ` Andrea Arcangeli
2011-11-01 21:00 ` Dan Magenheimer
2011-11-02 1:31 ` Andrea Arcangeli
2011-11-02 19:06 ` Dan Magenheimer
2011-11-03 0:32 ` Andrea Arcangeli
2011-11-03 22:29 ` Dan Magenheimer
2011-11-02 20:51 ` Rik van Riel
2011-11-02 21:14 ` Dan Magenheimer
2011-11-15 16:29 ` Rik van Riel
2011-11-15 17:33 ` Jeremy Fitzhardinge
2011-11-16 14:49 ` Konrad Rzeszutek Wilk
2011-11-01 10:16 ` James Bottomley
2011-11-01 18:21 ` Dan Magenheimer
2011-11-02 8:14 ` James Bottomley
2011-11-02 20:08 ` Dan Magenheimer
2011-11-03 10:30 ` Theodore Tso
2011-11-03 14:59 ` Dan Magenheimer
2011-11-02 15:44 ` Avi Kivity
2011-11-02 16:02 ` Andrea Arcangeli
2011-11-02 16:13 ` Avi Kivity
2011-11-02 20:27 ` Dan Magenheimer
2011-11-02 20:19 ` Dan Magenheimer
2011-10-27 21:44 ` Avi Miller
2011-10-27 22:33 ` Brian King
2011-10-28 5:17 ` Nitin Gupta
2011-10-29 13:43 ` Ed Tomlinson
2011-10-31 8:13 ` KAMEZAWA Hiroyuki
2011-10-31 16:38 ` Dan Magenheimer
2011-11-01 0:50 ` KAMEZAWA Hiroyuki
2011-11-01 15:25 ` Dan Magenheimer
2011-11-01 21:43 ` Andrew Morton
2011-11-01 22:25 ` Dan Magenheimer
2011-11-02 21:03 ` Rik van Riel
2011-11-02 21:42 ` Dan Magenheimer
2011-11-02 1:14 ` KAMEZAWA Hiroyuki
2011-11-02 15:12 ` Dan Magenheimer
2011-11-04 4:19 ` KAMEZAWA Hiroyuki
2011-11-03 16:49 ` Jan Beulich
2011-11-04 0:54 ` Andrew Morton
2011-11-04 8:49 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111031181651.GF3466@redhat.com \
--to=aarcange@redhat.com \
--cc=JBeulich@novell.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=corbet@lwn.net \
--cc=cyclonusj@gmail.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hch@infradead.org \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=sjenning@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).