From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jakub Jelinek <jakub@redhat.com>,
Ulrich Drepper <drepper@redhat.com>,
Andi Kleen <andi@firstfloor.org>, Rik van Riel <riel@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Hugh Dickins <hugh@veritas.com>
Subject: Re: missing madvise functionality
Date: Wed, 04 Apr 2007 12:22:00 +1000 [thread overview]
Message-ID: <46130BC8.9050905@yahoo.com.au> (raw)
In-Reply-To: <4612DCC6.7000504@cosmosbay.com>
Eric Dumazet wrote:
> Andrew Morton a écrit :
>
>> On Tue, 3 Apr 2007 16:29:37 -0400
>> Jakub Jelinek <jakub@redhat.com> wrote:
>>
>>> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> Ulrich, could you suggest a little test app which would demonstrate
>>>>> this
>>>>> behaviour?
>>>>
>>>> It's not really reliably possible to demonstrate this with a small
>>>> program using malloc. You'd need something like this mysql test case
>>>> which Rik said is not hard to run by yourself.
>>>>
>>>> If somebody adds a kernel interface I can easily produce a glibc patch
>>>> so that the test can be run in the new environment.
>>>>
>>>> But it's of course easy enough to simulate the specific problem in a
>>>> micro benchmark. If you want that let me know.
>>>
>>> I think something like following testcase which simulates what free
>>> and malloc do when trimming/growing a non-main arena.
>>>
>>> My guess is that all the page zeroing is pretty expensive as well and
>>> takes significant time, but I haven't profiled it.
>>>
>>> #include <pthread.h>
>>> #include <stdlib.h>
>>> #include <sys/mman.h>
>>> #include <unistd.h>
>>>
>>> void *
>>> tf (void *arg)
>>> {
>>> (void) arg;
>>> size_t ps = sysconf (_SC_PAGE_SIZE);
>>> void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>> if (p == MAP_FAILED)
>>> exit (1);
>>> int i;
>>> for (i = 0; i < 100000; i++)
>>> {
>>> /* Pretend to use the buffer. */
>>> char *q, *r = (char *) p + 128 * ps;
>>> size_t s;
>>> for (q = (char *) p; q < r; q += ps)
>>> *q = 1;
>>> for (s = 0, q = (char *) p; q < r; q += ps)
>>> s += *q;
>>> /* Free it. Replace this mmap with
>>> madvise (p, 128 * ps, MADV_THROWAWAY) when implemented. */
>>> if (mmap (p, 128 * ps, PROT_NONE,
>>> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
>>> exit (2);
>>> /* And immediately malloc again. This would then be deleted. */
>>> if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
>>> exit (3);
>>> }
>>> return NULL;
>>> }
>>>
>>> int
>>> main (void)
>>> {
>>> pthread_t th[32];
>>> int i;
>>> for (i = 0; i < 32; i++)
>>> if (pthread_create (&th[i], NULL, tf, NULL))
>>> exit (4);
>>> for (i = 0; i < 32; i++)
>>> pthread_join (th[i], NULL);
>>> return 0;
>>> }
>>>
>>
>> whee. 135,000 context switches/sec on a slow 2-way. mmap_sem, most
>> likely. That is ungood.
>>
>> Did anyone monitor the context switch rate with the mysql test?
>>
>> Interestingly, your test app (with s/100000/1000) runs to completion
>> in 13
>> seocnd on the slow 2-way. On a fast 8-way, it took 52 seconds and
>> sustained 40,000 context switches/sec. That's a bit unexpected.
>>
>> Both machines show ~8% idle time, too :(
>
>
> Yes... then add to this some futex work, and you get the picture.
>
> I do think such workloads might benefit from a vma_cache not shared by
> all threads but private to each thread. A sequence could invalidate the
> cache(s).
>
> ie instead of a mm->mmap_cache, having a mm->sequence, and each thread
> having a current->mmap_cache and current->mm_sequence
I have a patchset to do exactly this, btw.
Anyway what is the status of the private futex work. I don't think that
is very intrusive or complicated, so it should get merged ASAP (so then
at least we have the interface there).
--
SUSE Labs, Novell Inc.
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jakub Jelinek <jakub@redhat.com>,
Ulrich Drepper <drepper@redhat.com>,
Andi Kleen <andi@firstfloor.org>, Rik van Riel <riel@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Hugh Dickins <hugh@veritas.com>
Subject: Re: missing madvise functionality
Date: Wed, 04 Apr 2007 12:22:00 +1000 [thread overview]
Message-ID: <46130BC8.9050905@yahoo.com.au> (raw)
In-Reply-To: <4612DCC6.7000504@cosmosbay.com>
Eric Dumazet wrote:
> Andrew Morton a ecrit :
>
>> On Tue, 3 Apr 2007 16:29:37 -0400
>> Jakub Jelinek <jakub@redhat.com> wrote:
>>
>>> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> Ulrich, could you suggest a little test app which would demonstrate
>>>>> this
>>>>> behaviour?
>>>>
>>>> It's not really reliably possible to demonstrate this with a small
>>>> program using malloc. You'd need something like this mysql test case
>>>> which Rik said is not hard to run by yourself.
>>>>
>>>> If somebody adds a kernel interface I can easily produce a glibc patch
>>>> so that the test can be run in the new environment.
>>>>
>>>> But it's of course easy enough to simulate the specific problem in a
>>>> micro benchmark. If you want that let me know.
>>>
>>> I think something like following testcase which simulates what free
>>> and malloc do when trimming/growing a non-main arena.
>>>
>>> My guess is that all the page zeroing is pretty expensive as well and
>>> takes significant time, but I haven't profiled it.
>>>
>>> #include <pthread.h>
>>> #include <stdlib.h>
>>> #include <sys/mman.h>
>>> #include <unistd.h>
>>>
>>> void *
>>> tf (void *arg)
>>> {
>>> (void) arg;
>>> size_t ps = sysconf (_SC_PAGE_SIZE);
>>> void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>> if (p == MAP_FAILED)
>>> exit (1);
>>> int i;
>>> for (i = 0; i < 100000; i++)
>>> {
>>> /* Pretend to use the buffer. */
>>> char *q, *r = (char *) p + 128 * ps;
>>> size_t s;
>>> for (q = (char *) p; q < r; q += ps)
>>> *q = 1;
>>> for (s = 0, q = (char *) p; q < r; q += ps)
>>> s += *q;
>>> /* Free it. Replace this mmap with
>>> madvise (p, 128 * ps, MADV_THROWAWAY) when implemented. */
>>> if (mmap (p, 128 * ps, PROT_NONE,
>>> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
>>> exit (2);
>>> /* And immediately malloc again. This would then be deleted. */
>>> if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
>>> exit (3);
>>> }
>>> return NULL;
>>> }
>>>
>>> int
>>> main (void)
>>> {
>>> pthread_t th[32];
>>> int i;
>>> for (i = 0; i < 32; i++)
>>> if (pthread_create (&th[i], NULL, tf, NULL))
>>> exit (4);
>>> for (i = 0; i < 32; i++)
>>> pthread_join (th[i], NULL);
>>> return 0;
>>> }
>>>
>>
>> whee. 135,000 context switches/sec on a slow 2-way. mmap_sem, most
>> likely. That is ungood.
>>
>> Did anyone monitor the context switch rate with the mysql test?
>>
>> Interestingly, your test app (with s/100000/1000) runs to completion
>> in 13
>> seocnd on the slow 2-way. On a fast 8-way, it took 52 seconds and
>> sustained 40,000 context switches/sec. That's a bit unexpected.
>>
>> Both machines show ~8% idle time, too :(
>
>
> Yes... then add to this some futex work, and you get the picture.
>
> I do think such workloads might benefit from a vma_cache not shared by
> all threads but private to each thread. A sequence could invalidate the
> cache(s).
>
> ie instead of a mm->mmap_cache, having a mm->sequence, and each thread
> having a current->mmap_cache and current->mm_sequence
I have a patchset to do exactly this, btw.
Anyway what is the status of the private futex work. I don't think that
is very intrusive or complicated, so it should get merged ASAP (so then
at least we have the interface there).
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-04-04 2:22 UTC|newest]
Thread overview: 171+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-03 16:26 missing madvise functionality Ulrich Drepper
2007-04-03 16:55 ` Rik van Riel
2007-04-03 17:10 ` Ulrich Drepper
2007-04-03 17:37 ` Rik van Riel
2007-04-03 18:14 ` Andi Kleen
2007-04-03 17:20 ` Ulrich Drepper
2007-04-03 17:28 ` Andi Kleen
2007-04-03 19:59 ` Andrew Morton
2007-04-03 19:59 ` Andrew Morton
2007-04-03 20:09 ` Andi Kleen
2007-04-03 20:09 ` Andi Kleen
2007-04-03 20:17 ` Ulrich Drepper
2007-04-03 20:29 ` Jakub Jelinek
2007-04-03 20:29 ` Jakub Jelinek
2007-04-03 20:38 ` Rik van Riel
2007-04-03 20:38 ` Rik van Riel
2007-04-03 21:49 ` Andrew Morton
2007-04-03 21:49 ` Andrew Morton
2007-04-03 23:01 ` Eric Dumazet
2007-04-03 23:01 ` Eric Dumazet
2007-04-04 2:22 ` Nick Piggin [this message]
2007-04-04 2:22 ` Nick Piggin
2007-04-04 5:41 ` Eric Dumazet
2007-04-04 5:41 ` Eric Dumazet
2007-04-04 6:09 ` [patches] threaded vma patches (was Re: missing madvise functionality) Nick Piggin
2007-04-04 6:09 ` Nick Piggin
2007-04-04 6:26 ` Andrew Morton
2007-04-04 6:26 ` Andrew Morton
2007-04-04 6:38 ` Nick Piggin
2007-04-04 6:38 ` Nick Piggin
2007-04-04 6:42 ` Ulrich Drepper
2007-04-04 6:44 ` Nick Piggin
2007-04-04 6:44 ` Nick Piggin
2007-04-04 6:50 ` Eric Dumazet
2007-04-04 6:50 ` Eric Dumazet
2007-04-04 6:54 ` Ulrich Drepper
2007-04-04 7:33 ` Eric Dumazet
2007-04-04 7:33 ` Eric Dumazet
2007-04-04 8:25 ` missing madvise functionality Peter Zijlstra
2007-04-04 8:25 ` Peter Zijlstra
2007-04-04 8:55 ` Nick Piggin
2007-04-04 9:12 ` William Lee Irwin III
2007-04-04 9:12 ` William Lee Irwin III
2007-04-04 9:23 ` Nick Piggin
2007-04-04 9:23 ` Nick Piggin
2007-04-04 9:34 ` Eric Dumazet
2007-04-04 9:34 ` Eric Dumazet
2007-04-04 9:45 ` Nick Piggin
2007-04-04 9:45 ` Nick Piggin
2007-04-04 10:05 ` Nick Piggin
2007-04-04 10:05 ` Nick Piggin
2007-04-04 11:54 ` Eric Dumazet
2007-04-04 11:54 ` Eric Dumazet
2007-04-05 2:01 ` Nick Piggin
2007-04-05 2:01 ` Nick Piggin
2007-04-05 6:09 ` Eric Dumazet
2007-04-05 6:09 ` Eric Dumazet
2007-04-05 6:19 ` Ulrich Drepper
2007-04-05 6:54 ` Eric Dumazet
2007-04-05 6:54 ` Eric Dumazet
2007-04-03 23:02 ` Andrew Morton
2007-04-03 23:02 ` Andrew Morton
2007-04-04 9:15 ` Hugh Dickins
2007-04-04 9:15 ` Hugh Dickins
2007-04-04 14:55 ` Rik van Riel
2007-04-04 14:55 ` Rik van Riel
2007-04-04 15:25 ` Hugh Dickins
2007-04-04 15:25 ` Hugh Dickins
2007-04-05 1:44 ` Nick Piggin
2007-04-05 1:44 ` Nick Piggin
2007-04-04 18:04 ` Andrew Morton
2007-04-04 18:04 ` Andrew Morton
2007-04-04 18:08 ` Rik van Riel
2007-04-04 18:08 ` Rik van Riel
2007-04-04 20:56 ` Andrew Morton
2007-04-04 20:56 ` Andrew Morton
2007-04-04 18:39 ` Hugh Dickins
2007-04-04 18:39 ` Hugh Dickins
2007-04-03 23:44 ` Andrew Morton
2007-04-03 23:44 ` Andrew Morton
2007-04-04 13:09 ` William Lee Irwin III
2007-04-04 13:38 ` William Lee Irwin III
2007-04-04 13:38 ` William Lee Irwin III
2007-04-04 18:51 ` Andrew Morton
2007-04-04 18:51 ` Andrew Morton
2007-04-05 4:14 ` William Lee Irwin III
2007-04-05 4:14 ` William Lee Irwin III
2007-04-04 23:00 ` preemption and rwsems (was: Re: missing madvise functionality) Andrew Morton
2007-04-04 23:00 ` Andrew Morton
2007-04-05 12:48 ` David Howells
2007-04-05 12:48 ` David Howells
2007-04-05 19:11 ` Ingo Molnar
2007-04-05 19:11 ` Ingo Molnar
2007-04-05 20:37 ` Andrew Morton
2007-04-05 20:37 ` Andrew Morton
2007-04-06 9:08 ` Ingo Molnar
2007-04-06 19:30 ` Andrew Morton
2007-04-06 19:30 ` Andrew Morton
2007-04-06 19:40 ` Ingo Molnar
2007-04-06 19:40 ` Ingo Molnar
2007-04-05 19:27 ` Andrew Morton
2007-04-05 19:27 ` Andrew Morton
2007-04-05 7:31 ` missing madvise functionality Rik van Riel
2007-04-05 7:39 ` Rik van Riel
2007-04-05 7:39 ` Rik van Riel
2007-04-05 8:32 ` Andrew Morton
2007-04-05 8:32 ` Andrew Morton
2007-04-05 15:47 ` Rik van Riel
2007-04-05 15:47 ` Rik van Riel
2007-04-05 8:08 ` Eric Dumazet
2007-04-05 8:08 ` Eric Dumazet
2007-04-05 8:31 ` Rik van Riel
2007-04-05 8:31 ` Rik van Riel
2007-04-05 9:06 ` Eric Dumazet
2007-04-05 9:06 ` Eric Dumazet
2007-04-05 9:45 ` Jakub Jelinek
2007-04-05 9:45 ` Jakub Jelinek
2007-04-05 16:15 ` Rik van Riel
2007-04-05 16:15 ` Rik van Riel
2007-04-05 16:10 ` Ulrich Drepper
2007-04-06 2:28 ` Nick Piggin
2007-04-06 2:28 ` Nick Piggin
2007-04-06 2:52 ` Ulrich Drepper
2007-04-06 2:59 ` Nick Piggin
2007-04-06 2:59 ` Nick Piggin
2007-04-03 20:51 ` Andrew Morton
2007-04-03 20:51 ` Andrew Morton
2007-04-03 20:57 ` Ulrich Drepper
2007-04-03 21:00 ` Rik van Riel
2007-04-03 21:00 ` Rik van Riel
2007-04-03 21:10 ` Eric Dumazet
2007-04-03 21:10 ` Eric Dumazet
2007-04-03 21:12 ` Jörn Engel
2007-04-03 21:12 ` Jörn Engel
2007-04-03 21:15 ` Rik van Riel
2007-04-03 21:15 ` Rik van Riel
2007-04-03 21:30 ` Eric Dumazet
2007-04-03 21:30 ` Eric Dumazet
2007-04-03 21:22 ` Jeremy Fitzhardinge
2007-04-03 21:22 ` Jeremy Fitzhardinge
2007-04-03 21:29 ` Rik van Riel
2007-04-03 21:29 ` Rik van Riel
2007-04-03 21:46 ` Ulrich Drepper
2007-04-03 22:51 ` Andi Kleen
2007-04-03 22:51 ` Andi Kleen
2007-04-03 23:07 ` Ulrich Drepper
2007-04-03 21:16 ` Andrew Morton
2007-04-03 21:16 ` Andrew Morton
2007-04-04 18:49 ` Anton Blanchard
2007-04-04 18:49 ` Anton Blanchard
2007-04-03 22:07 ` Arnd Bergmann
2007-04-03 22:23 ` Ulrich Drepper
2007-04-04 2:53 ` Marko Macek
2007-04-04 2:56 ` Rik van Riel
2007-04-04 14:37 ` Hugh Dickins
2007-04-04 7:46 ` Nick Piggin
2007-04-04 8:04 ` Nick Piggin
2007-04-04 8:04 ` Nick Piggin
2007-04-04 8:20 ` Jakub Jelinek
2007-04-04 8:20 ` Jakub Jelinek
2007-04-04 8:47 ` Nick Piggin
2007-04-04 8:47 ` Nick Piggin
2007-04-05 4:23 ` Nick Piggin
2007-04-05 18:38 ` Rik van Riel
2007-04-05 18:38 ` Rik van Riel
2007-04-05 21:07 ` Andrew Morton
2007-04-05 21:07 ` Andrew Morton
2007-04-05 21:39 ` Rik van Riel
2007-04-05 21:39 ` Rik van Riel
2007-04-06 1:28 ` Nick Piggin
2007-04-06 1:28 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46130BC8.9050905@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=dada1@cosmosbay.com \
--cc=drepper@redhat.com \
--cc=hugh@veritas.com \
--cc=jakub@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.