From: Andrea Arcangeli <andrea@suse.de>
To: Daniel Phillips <phillips@bonn-fries.net>
Cc: Bill Davidsen <davidsen@tmr.com>,
Mike Fedyk <mfedyk@matchmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: 2.4.19pre1aa1
Date: Mon, 4 Mar 2002 03:25:35 +0100 [thread overview]
Message-ID: <20020304032535.F20606@dualathlon.random> (raw)
In-Reply-To: <20020301013056.GD2711@matchmail.com> <E16hdgg-0000Py-00@starship.berlin> <20020304014950.E20606@dualathlon.random> <E16hhYV-0000Qz-00@starship.berlin>
In-Reply-To: <E16hhYV-0000Qz-00@starship.berlin>
On Mon, Mar 04, 2002 at 02:46:22AM +0100, Daniel Phillips wrote:
> On March 4, 2002 01:49 am, Andrea Arcangeli wrote:
> > On Sun, Mar 03, 2002 at 10:38:34PM +0100, Daniel Phillips wrote:
> > > On March 2, 2002 03:06 am, Andrea Arcangeli wrote:
> > > > On Thu, Feb 28, 2002 at 10:26:48PM -0500, Bill Davidsen wrote:
> > > > > rather than patches. But there are a lot more small machines (which I feel
> > > > > are better served by rmap) than large. I would like to leave the jury out
> > > >
> > > > I think there's quite some confusion going on from the rmap users, let's
> > > > clarify the facts.
> > > >
> > > > The rmap design in the VM is all about decreasing the complexity of
> > > > swap_out on the huge boxes (so it's all about saving CPU), by slowing
> > > > down a big lots of fast common paths like page faults and by paying with
> > > > some memory too. See the lmbench numbers posted by Randy after applying
> > > > rmap to see what I mean.
> > >
> > > Do you know any reason why rmap must slow down the page fault fast, or are
> > > you just thinking about Rik's current implementation? Yes, rmap has to add
> > > a pte_chain entry there, but it can be a direct pointer in the unshared case
> > > and the spinlock looks like it can be avoided in the common case as well.
> >
> > unshared isn't the very common case (shm, and file mappings like
> > executables are all going to be shared, not unshared).
>
> As soon as you have shared pages you start to benefit from rmap's ability
> to unmap in one step, so the cost of creating the link is recovered by not
we'd benefit also with unshared pages.
BTW, for the map shared mappings we just collect the rmap information,
we need it for vmtruncate, but it's not layed out for efficient
browsing, it's only meant to make vmtruncate work.
> having to scan two page tables to unmap it. In theory. Do you see a hole
> in that?
Just the fact you never need the reverse lookup during lots of
important production usages (first that cames to mind is when you have
enough ram to do your job, all number crunching/fileserving, and most
servers are setup that way). This is the whole point. Note that this
has nothing to do with the "cache" part, this is only about the
pageout/swapout stage, only a few servers really needs heavy swapout.
The background swapout to avoid unused services to stay in ram forever,
doesn't matter with rmap or w/o rmap design.
And on the other case (heavy swapout/pageouts like in some hard DBMS
usage, simualtions and laptops or legacy desktops) we would mostly save
CPU and reduce complexity, but I really don't see system load during
heavy pageouts/swapouts yet, so I don't see an obvious need of save cpu
there either.
Probably the main difference visible in numbers would be infact to
follow a perfect lru, but really giving mapped pages an higher chance is
beneficial. Another bit in the current design of round robin cycling
over the whole VM clearing the accessed bitflag and activating physical
pages if needed, can also be see also as a feature in some ways. It is
much better at providing a kind of "clock based" aging to the accessed
bit information, while the lru pass rmap aware, wouldn't really be fair
with all the virtual pages the same way as we do now.
> > So unless you first share all the pagetables as well (like Ben once said
> > years ago), it's not going to be a direct pointer in the very common
> > case. And there's no guarantee you can share the pagetable (even
> > assuming the kernels supports that at the maximum possible degree across
> > execve and at random mmaps too) if you map those pages at different
> > virtual addresses.
>
> The virtual alignment just needs to be the same modulo 4 MB. There are
> other requirements as well, but being able to share seems to be the common
> case.
Yep on x86 w/o PAE. With PAE enabled (or x86-64 kernel) it needs to be
the same layout of phys pages on a naturally aligned 2M chunk. I trust
that will match often in theory, but still tracking it down over execve
and on random mmaps looks not that easy, I think for tracking that down
we'd really need the rmap information for everything (not just map
shared like right now). And also doing all the checks and walking the
reverse maps won't be zero cost, but I can see the benefit of the full
pte sharing (starting from cpu cache utilization across tlb flushes).
Infact it maybe rmap will be more useful for things like enabling the full
pagetable sharing you're suggesting above, rather than for replacing the
swap_out round robing cycle over the VM. so it might be used only for MM
internals rather than for VM internals.
Andrea
next prev parent reply other threads:[~2002-03-04 2:28 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-02-27 12:50 2.4.19pre1aa1 Andrea Arcangeli
2002-02-28 22:11 ` 2.4.19pre1aa1 Bill Davidsen
2002-03-01 1:30 ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01 3:26 ` 2.4.19pre1aa1 Bill Davidsen
2002-03-01 3:46 ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01 12:51 ` 2.4.19pre1aa1 Rik van Riel
2002-03-01 18:37 ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01 10:17 ` 2.4.19pre1aa1 Marco Colombo
2002-03-01 11:37 ` 2.4.19pre1aa1 Alan Cox
2002-03-02 2:06 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-02 2:28 ` 2.4.19pre1aa1 Alan Cox
2002-03-02 3:30 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-03 21:38 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 0:49 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 1:46 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 2:25 ` Andrea Arcangeli [this message]
2002-03-04 3:22 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 12:41 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 14:05 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 14:23 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 16:10 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 16:28 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 16:59 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 18:18 ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-04 18:41 ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-04 18:46 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 22:06 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 23:03 ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 11:23 ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-05 17:35 ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 0:12 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 6:21 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 21:37 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 18:19 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 18:56 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 22:25 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 23:09 ` 2.4.19pre1aa1 Gerrit Huizenga
2002-03-05 0:19 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 2:00 ` 2.4.19pre1aa1 Gerrit Huizenga
2002-03-04 22:38 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 21:36 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 23:01 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 23:11 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 23:52 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 0:01 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 1:05 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 1:26 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 1:40 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 1:55 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05 5:16 ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 5:47 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05 6:33 ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 12:22 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 15:01 ` 2.4.19pre1aa1 Andrea Arcangeli
[not found] ` <Pine.LNX.4.44L.0203050921510.1413-100000@duckman.distro.conecti va>
2002-03-05 15:29 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05 15:43 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 3:05 ` 2.4.19pre1aa1 Bill Davidsen
2002-03-05 8:35 ` 2.4.19pre1aa1 arjan
2002-03-05 12:41 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 15:10 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 16:57 ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 18:26 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 18:30 ` 2.4.19pre1aa1 Arjan van de Ven
2002-03-05 19:12 ` 2.4.19pre1aa1 Andrew Morton
2002-03-05 23:03 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:05 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:24 ` 2.4.19pre1aa1 Andrew Morton
2002-03-05 23:37 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:51 ` 2.4.19pre1aa1 Andrew Morton
2002-03-06 0:09 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-05 14:55 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 5:38 ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05 6:45 ` 2.4.19pre1aa1 David Lang
[not found] ` <200203021958.g22JwKq08818@Port.imtp.ilyichevsk.odessa.ua>
2002-03-02 20:47 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-02 20:58 ` 2.4.19pre1aa1 Robert Love
2002-03-05 22:16 ` 2.4.19pre1aa1 Bill Davidsen
-- strict thread matches above, loose matches on Subject: below --
2002-02-28 2:57 2.4.19pre1aa1 rwhron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020304032535.F20606@dualathlon.random \
--to=andrea@suse.de \
--cc=davidsen@tmr.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mfedyk@matchmail.com \
--cc=phillips@bonn-fries.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox