From: Ulrich Drepper <drepper@redhat.com>
To: Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Subject: missing madvise functionality
Date: Tue, 03 Apr 2007 09:26:57 -0700 [thread overview]
Message-ID: <46128051.9000609@redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]
People might remember the thread about mysql not scaling and pointing
the finger quite happily at glibc. Well, the situation is not like that.
The problem is glibc has to work around kernel limitations. If the
malloc implementation detects that a large chunk of previously allocated
memory is now free and unused it wants to return the memory to the
system. What we currently have to do is this:
to free: mmap(PROT_NONE) over the area
to reuse: mprotect(PROT_READ|PROT_WRITE)
Yep, that's expensive, both operations need to get locks preventing
other threads from doing the same.
Some people were quick to suggest that we simply avoid the freeing in
many situations (that's what the patch submitted by Yanmin Zhang
basically does). That's no solution. One of the very good properties
of the current allocator is that it does not use much memory.
A solution for this problem is a madvise() operation with the following
property:
- the content of the address range can be discarded
- if an access to a page in the range happens in the future it must
succeed. The old page content can be provided or a new, empty page
can be provided
That's it. The current MADV_DONTNEED doesn't cut it because it zaps the
pages, causing *all* future reuses to create page faults. This is what
I guess happens in the mysql test case where the pages where unused and
freed but then almost immediately reused. The page faults erased all
the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
calls.
So, if all those who were so interested in that micro benchmark could
now please direct their attention to a good madvise solution I'd be much
obliged. It'll be put to good use right away and it should be quite
easy to provide a glibc patch to test the new kernel code.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
next reply other threads:[~2007-04-03 16:27 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-03 16:26 Ulrich Drepper [this message]
2007-04-03 16:55 ` missing madvise functionality Rik van Riel
2007-04-03 17:10 ` Ulrich Drepper
2007-04-03 17:37 ` Rik van Riel
2007-04-03 18:14 ` Andi Kleen
2007-04-03 17:20 ` Ulrich Drepper
2007-04-03 17:28 ` Andi Kleen
2007-04-03 19:59 ` Andrew Morton
2007-04-03 20:09 ` Andi Kleen
2007-04-03 20:17 ` Ulrich Drepper
2007-04-03 20:29 ` Jakub Jelinek
2007-04-03 20:38 ` Rik van Riel
2007-04-03 21:49 ` Andrew Morton
2007-04-03 23:01 ` Eric Dumazet
2007-04-04 2:22 ` Nick Piggin
2007-04-04 5:41 ` Eric Dumazet
2007-04-04 6:09 ` [patches] threaded vma patches (was Re: missing madvise functionality) Nick Piggin
2007-04-04 6:26 ` Andrew Morton
2007-04-04 6:38 ` Nick Piggin
2007-04-04 6:42 ` Ulrich Drepper
2007-04-04 6:44 ` Nick Piggin
2007-04-04 6:50 ` Eric Dumazet
2007-04-04 6:54 ` Ulrich Drepper
2007-04-04 7:33 ` Eric Dumazet
2007-04-04 8:25 ` missing madvise functionality Peter Zijlstra
2007-04-04 8:55 ` Nick Piggin
2007-04-04 9:12 ` William Lee Irwin III
2007-04-04 9:23 ` Nick Piggin
2007-04-04 9:34 ` Eric Dumazet
2007-04-04 9:45 ` Nick Piggin
2007-04-04 10:05 ` Nick Piggin
2007-04-04 11:54 ` Eric Dumazet
2007-04-05 2:01 ` Nick Piggin
2007-04-05 6:09 ` Eric Dumazet
2007-04-05 6:19 ` Ulrich Drepper
2007-04-05 6:54 ` Eric Dumazet
2007-04-03 23:02 ` Andrew Morton
2007-04-04 9:15 ` Hugh Dickins
2007-04-04 14:55 ` Rik van Riel
2007-04-04 15:25 ` Hugh Dickins
2007-04-05 1:44 ` Nick Piggin
2007-04-04 18:04 ` Andrew Morton
2007-04-04 18:08 ` Rik van Riel
2007-04-04 20:56 ` Andrew Morton
2007-04-04 18:39 ` Hugh Dickins
2007-04-03 23:44 ` Andrew Morton
2007-04-04 13:09 ` William Lee Irwin III
2007-04-04 13:38 ` William Lee Irwin III
2007-04-04 18:51 ` Andrew Morton
2007-04-05 4:14 ` William Lee Irwin III
2007-04-04 23:00 ` preemption and rwsems (was: Re: missing madvise functionality) Andrew Morton
2007-04-05 12:48 ` David Howells
2007-04-05 19:11 ` Ingo Molnar
2007-04-05 20:37 ` Andrew Morton
2007-04-06 9:08 ` Ingo Molnar
2007-04-06 19:30 ` Andrew Morton
2007-04-06 19:40 ` Ingo Molnar
2007-04-05 19:27 ` Andrew Morton
2007-04-05 7:31 ` missing madvise functionality Rik van Riel
2007-04-05 7:39 ` Rik van Riel
2007-04-05 8:32 ` Andrew Morton
2007-04-05 15:47 ` Rik van Riel
2007-04-05 8:08 ` Eric Dumazet
2007-04-05 8:31 ` Rik van Riel
2007-04-05 9:06 ` Eric Dumazet
2007-04-05 9:45 ` Jakub Jelinek
2007-04-05 16:15 ` Rik van Riel
2007-04-05 16:10 ` Ulrich Drepper
2007-04-06 2:28 ` Nick Piggin
2007-04-06 2:52 ` Ulrich Drepper
2007-04-06 2:59 ` Nick Piggin
2007-04-03 20:51 ` Andrew Morton
2007-04-03 20:57 ` Ulrich Drepper
2007-04-03 21:00 ` Rik van Riel
2007-04-03 21:10 ` Eric Dumazet
2007-04-03 21:12 ` Jörn Engel
2007-04-03 21:15 ` Rik van Riel
2007-04-03 21:30 ` Eric Dumazet
2007-04-03 21:22 ` Jeremy Fitzhardinge
2007-04-03 21:29 ` Rik van Riel
2007-04-03 21:46 ` Ulrich Drepper
2007-04-03 22:51 ` Andi Kleen
2007-04-03 23:07 ` Ulrich Drepper
2007-04-03 21:16 ` Andrew Morton
2007-04-04 18:49 ` Anton Blanchard
2007-04-03 22:07 ` Arnd Bergmann
2007-04-03 22:23 ` Ulrich Drepper
2007-04-04 2:53 ` Marko Macek
2007-04-04 2:56 ` Rik van Riel
2007-04-04 14:37 ` Hugh Dickins
2007-04-04 7:46 ` Nick Piggin
2007-04-04 8:04 ` Nick Piggin
2007-04-04 8:20 ` Jakub Jelinek
2007-04-04 8:47 ` Nick Piggin
2007-04-05 4:23 ` Nick Piggin
2007-04-05 18:38 ` Rik van Riel
2007-04-05 21:07 ` Andrew Morton
2007-04-05 21:39 ` Rik van Riel
2007-04-06 1:28 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46128051.9000609@redhat.com \
--to=drepper@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jakub@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox