From: Ulrich Drepper <drepper@redhat.com>
To: Blaisorblade <blaisorblade@yahoo.it>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org,
Linux Memory Management <linux-mm@kvack.org>,
Val Henson <val.henson@intel.com>
Subject: Re: [patch 00/14] remap_file_pages protection support
Date: Sat, 06 May 2006 09:05:29 -0700 [thread overview]
Message-ID: <445CC949.7050900@redhat.com> (raw)
In-Reply-To: <200605030225.54598.blaisorblade@yahoo.it>
[-- Attachment #1: Type: text/plain, Size: 3971 bytes --]
Blaisorblade wrote:
> I've not seen the numbers indeed, I've been told of a problem with a "customer
> program" and Ingo connected my work with this problem. Frankly, I've been
> always astonished about how looking up a 10-level tree can be slow. Poor
> cache locality is the only thing that I could think about.
It might be good if I explain a bit how much we use mmap in libc. The
numbers can really add up quickly.
- for each loaded DSO we might have up to 5 VMAs today:
1. text segment, protection r-xp, normally never written to
2. a gap, protection ---p (i.e., no access)
3. a relro data segment, protectection r--p
4. a data segment, protection rw-p
5. a bss segment, anonymous memory
The first four are mapped from the file. In fact, the first segment
"allocates" the entire address space of all segment, even if it's longer
than the file.
Then gap is done using mprotect(PROT_NONE). Then the area for segment 3
and 4 is mapped in one mmap() call. It's in the same file but the
offset used in the mmap is not the same as the same as the offset which
naturally is already established through the first mmap. I.e., if the
first mmap() would start at offset 0 and continue for 1000 pages, the
gap might start at a, say, offset of 4 pages and continue for 500 pages.
Then the "natural" offset of the first data page would be 504 pages but
the second mmap() call would in fact use the offset 4 because the text
and data segment are continuous in the _file_ (although not in memory).
Anyway, once relocations are done the protection of the relro segment is
changed, splitting the data segment in two.
So, for DSO loading there would be two steps of improvement:
1. if a mprotect() call wouldn't split the VMA we would have 3 VMAs in
the end instead of 5. 40% gain.
2. if I could use remap_file_pages() for the data segment mapping and
the call would allow changing the protection and it would not split the
VMAs, then we'd be down to 2 mappings. 60% down.
A second big VMA user are thread stacks. I think the application which
was mentioned in this thread briefly used literally thousands of
threads. Leaving aside the insanity of this (it's unfortunately how
many programmers work) this can create problems because we get at least
two (on IA-64 three) VMAs per thread. I.e., thread stack allocation
works likes this:
1. allocate area big enough for stack and guard (we don't use automatic
growing, this cannot work)
2. change the protection of the guard end of the stack to PROT_NONE.
So, for say 1000 threads we'll end up with 2000 VMAs. Threads are also
important to mention here because
- often they are short-lived and we have to recreate them often. We
usually reuse stacks but only keep that much allocated memory around.
So more often than we like we actually free and later re-allocate stacks.
- these thousands of stack VMAs are really used concurrently. ALl
threads are woken over a period of time.
A third source of VMAs is anonymous memory allocation. mmap is used in
the malloc implementation and directly in various places. For
randomization reasons there isn't really much we can do here, we
shouldn't lump all these allocations together.
A fourth source of VMAs are the programs themselves which mmap files.
Often read-only mappings of many small files.
Put all this together and non-trivial apps as written today (I don't say
they are high-quality apps) can easily have a few thousand, maybe even
10,000 to 20,000 VMAs. Firefox on my machine uses in the moment ~560
VMAs and this is with only a handful of threads. Are these the numbers
the VM system is optimized for? I think what our people running the
experiments at the customer site saw is that it's not. The VMA
traversal showed up on the profile lists.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
next prev parent reply other threads:[~2006-05-06 16:05 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060430172953.409399000@zion.home.lan>
2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin
2006-05-02 3:56 ` Nick Piggin
2006-05-02 11:24 ` Ingo Molnar
2006-05-02 12:19 ` Nick Piggin
2006-05-02 17:16 ` Lee Schermerhorn
2006-05-03 1:20 ` Blaisorblade
2006-05-03 14:35 ` Lee Schermerhorn
2006-05-03 0:25 ` Blaisorblade
2006-05-06 16:05 ` Ulrich Drepper [this message]
2006-05-07 4:22 ` Nick Piggin
2006-05-13 14:13 ` Nick Piggin
2006-05-13 18:19 ` Valerie Henson
2006-05-13 22:54 ` Valerie Henson
2006-05-16 13:30 ` Nick Piggin
2006-05-16 13:51 ` Andreas Mohr
2006-05-16 16:31 ` Valerie Henson
2006-05-16 16:47 ` Andreas Mohr
2006-05-17 3:25 ` Nick Piggin
2006-05-17 6:10 ` Blaisorblade
2006-05-16 16:33 ` Valerie Henson
2006-05-03 0:44 ` Blaisorblade
2006-05-06 9:06 ` Nick Piggin
2006-05-06 15:26 ` Ulrich Drepper
[not found] ` <20060430173025.752423000@zion.home.lan>
2006-05-02 3:53 ` [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes Nick Piggin
2006-05-03 1:29 ` Blaisorblade
2006-05-06 10:03 ` Nick Piggin
2006-05-07 17:50 ` Blaisorblade
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445CC949.7050900@redhat.com \
--to=drepper@redhat.com \
--cc=akpm@osdl.org \
--cc=blaisorblade@yahoo.it \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=val.henson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).