All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ulrich Drepper <drepper@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Jakub Jelinek <jakub@redhat.com>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: missing madvise functionality
Date: Wed, 04 Apr 2007 18:04:30 +1000	[thread overview]
Message-ID: <46135C0E.5070803@yahoo.com.au> (raw)
In-Reply-To: <461357C4.4010403@yahoo.com.au>

Nick Piggin wrote:
> Ulrich Drepper wrote:
> 
>> People might remember the thread about mysql not scaling and pointing
>> the finger quite happily at glibc.  Well, the situation is not like that.
>>
>> The problem is glibc has to work around kernel limitations.  If the
>> malloc implementation detects that a large chunk of previously allocated
>> memory is now free and unused it wants to return the memory to the
>> system.  What we currently have to do is this:
>>
>>   to free:      mmap(PROT_NONE) over the area
>>   to reuse:     mprotect(PROT_READ|PROT_WRITE)
>>
>> Yep, that's expensive, both operations need to get locks preventing
>> other threads from doing the same.
>>
>> Some people were quick to suggest that we simply avoid the freeing in
>> many situations (that's what the patch submitted by Yanmin Zhang
>> basically does).  That's no solution.  One of the very good properties
>> of the current allocator is that it does not use much memory.
> 
> 
> Does mmap(PROT_NONE) actually free the memory?
> 
> 
>> A solution for this problem is a madvise() operation with the following
>> property:
>>
>>   - the content of the address range can be discarded
>>
>>   - if an access to a page in the range happens in the future it must
>>     succeed.  The old page content can be provided or a new, empty page
>>     can be provided
>>
>> That's it.  The current MADV_DONTNEED doesn't cut it because it zaps the
>> pages, causing *all* future reuses to create page faults.  This is what
>> I guess happens in the mysql test case where the pages where unused and
>> freed but then almost immediately reused.  The page faults erased all
>> the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
>> calls.
> 
> 
> Two questions.
> 
> In the case of pages being unused then almost immediately reused, why is
> it a bad solution to avoid freeing? Is it that you want to avoid
> heuristics because in some cases they could fail and end up using memory?
> 
> Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault
> than a syscall? (including the cost of the TLB fill for the memory access
> after the syscall, of course).
> 
> zapping the pages puts them on a nice LIFO cache hot list of pages that
> can be quickly used when the next fault comes in, or used for any other
> allocation in the kernel. Putting them on some sort of reclaim list seems
> a bit pointless.
> 
> Oh, also: something like this patch would help out MADV_DONTNEED, as it
> means it can run concurrently with page faults. I think the locking will
> work (but needs forward porting).

BTW. and this way it becomes much more attractive than using mmap/mprotect
can ever be, because they must take mmap_sem for writing always.

You don't actually need to protect the ranges unless running with use after
free debugging turned on, do you?

-- 
SUSE Labs, Novell Inc.

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ulrich Drepper <drepper@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Jakub Jelinek <jakub@redhat.com>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: missing madvise functionality
Date: Wed, 04 Apr 2007 18:04:30 +1000	[thread overview]
Message-ID: <46135C0E.5070803@yahoo.com.au> (raw)
In-Reply-To: <461357C4.4010403@yahoo.com.au>

Nick Piggin wrote:
> Ulrich Drepper wrote:
> 
>> People might remember the thread about mysql not scaling and pointing
>> the finger quite happily at glibc.  Well, the situation is not like that.
>>
>> The problem is glibc has to work around kernel limitations.  If the
>> malloc implementation detects that a large chunk of previously allocated
>> memory is now free and unused it wants to return the memory to the
>> system.  What we currently have to do is this:
>>
>>   to free:      mmap(PROT_NONE) over the area
>>   to reuse:     mprotect(PROT_READ|PROT_WRITE)
>>
>> Yep, that's expensive, both operations need to get locks preventing
>> other threads from doing the same.
>>
>> Some people were quick to suggest that we simply avoid the freeing in
>> many situations (that's what the patch submitted by Yanmin Zhang
>> basically does).  That's no solution.  One of the very good properties
>> of the current allocator is that it does not use much memory.
> 
> 
> Does mmap(PROT_NONE) actually free the memory?
> 
> 
>> A solution for this problem is a madvise() operation with the following
>> property:
>>
>>   - the content of the address range can be discarded
>>
>>   - if an access to a page in the range happens in the future it must
>>     succeed.  The old page content can be provided or a new, empty page
>>     can be provided
>>
>> That's it.  The current MADV_DONTNEED doesn't cut it because it zaps the
>> pages, causing *all* future reuses to create page faults.  This is what
>> I guess happens in the mysql test case where the pages where unused and
>> freed but then almost immediately reused.  The page faults erased all
>> the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
>> calls.
> 
> 
> Two questions.
> 
> In the case of pages being unused then almost immediately reused, why is
> it a bad solution to avoid freeing? Is it that you want to avoid
> heuristics because in some cases they could fail and end up using memory?
> 
> Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault
> than a syscall? (including the cost of the TLB fill for the memory access
> after the syscall, of course).
> 
> zapping the pages puts them on a nice LIFO cache hot list of pages that
> can be quickly used when the next fault comes in, or used for any other
> allocation in the kernel. Putting them on some sort of reclaim list seems
> a bit pointless.
> 
> Oh, also: something like this patch would help out MADV_DONTNEED, as it
> means it can run concurrently with page faults. I think the locking will
> work (but needs forward porting).

BTW. and this way it becomes much more attractive than using mmap/mprotect
can ever be, because they must take mmap_sem for writing always.

You don't actually need to protect the ranges unless running with use after
free debugging turned on, do you?

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-04-04  8:04 UTC|newest]

Thread overview: 171+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-03 16:26 missing madvise functionality Ulrich Drepper
2007-04-03 16:55 ` Rik van Riel
2007-04-03 17:10   ` Ulrich Drepper
2007-04-03 17:37     ` Rik van Riel
2007-04-03 18:14 ` Andi Kleen
2007-04-03 17:20   ` Ulrich Drepper
2007-04-03 17:28     ` Andi Kleen
2007-04-03 19:59       ` Andrew Morton
2007-04-03 19:59         ` Andrew Morton
2007-04-03 20:09         ` Andi Kleen
2007-04-03 20:09           ` Andi Kleen
2007-04-03 20:17         ` Ulrich Drepper
2007-04-03 20:29           ` Jakub Jelinek
2007-04-03 20:29             ` Jakub Jelinek
2007-04-03 20:38             ` Rik van Riel
2007-04-03 20:38               ` Rik van Riel
2007-04-03 21:49             ` Andrew Morton
2007-04-03 21:49               ` Andrew Morton
2007-04-03 23:01               ` Eric Dumazet
2007-04-03 23:01                 ` Eric Dumazet
2007-04-04  2:22                 ` Nick Piggin
2007-04-04  2:22                   ` Nick Piggin
2007-04-04  5:41                   ` Eric Dumazet
2007-04-04  5:41                     ` Eric Dumazet
2007-04-04  6:09                     ` [patches] threaded vma patches (was Re: missing madvise functionality) Nick Piggin
2007-04-04  6:09                       ` Nick Piggin
2007-04-04  6:26                       ` Andrew Morton
2007-04-04  6:26                         ` Andrew Morton
2007-04-04  6:38                         ` Nick Piggin
2007-04-04  6:38                           ` Nick Piggin
2007-04-04  6:42                       ` Ulrich Drepper
2007-04-04  6:44                         ` Nick Piggin
2007-04-04  6:44                           ` Nick Piggin
2007-04-04  6:50                         ` Eric Dumazet
2007-04-04  6:50                           ` Eric Dumazet
2007-04-04  6:54                           ` Ulrich Drepper
2007-04-04  7:33                             ` Eric Dumazet
2007-04-04  7:33                               ` Eric Dumazet
2007-04-04  8:25                   ` missing madvise functionality Peter Zijlstra
2007-04-04  8:25                     ` Peter Zijlstra
2007-04-04  8:55                     ` Nick Piggin
2007-04-04  9:12                       ` William Lee Irwin III
2007-04-04  9:12                         ` William Lee Irwin III
2007-04-04  9:23                         ` Nick Piggin
2007-04-04  9:23                           ` Nick Piggin
2007-04-04  9:34                       ` Eric Dumazet
2007-04-04  9:34                         ` Eric Dumazet
2007-04-04  9:45                         ` Nick Piggin
2007-04-04  9:45                           ` Nick Piggin
2007-04-04 10:05                         ` Nick Piggin
2007-04-04 10:05                           ` Nick Piggin
2007-04-04 11:54                           ` Eric Dumazet
2007-04-04 11:54                             ` Eric Dumazet
2007-04-05  2:01                             ` Nick Piggin
2007-04-05  2:01                               ` Nick Piggin
2007-04-05  6:09                               ` Eric Dumazet
2007-04-05  6:09                                 ` Eric Dumazet
2007-04-05  6:19                                 ` Ulrich Drepper
2007-04-05  6:54                                   ` Eric Dumazet
2007-04-05  6:54                                     ` Eric Dumazet
2007-04-03 23:02               ` Andrew Morton
2007-04-03 23:02                 ` Andrew Morton
2007-04-04  9:15                 ` Hugh Dickins
2007-04-04  9:15                   ` Hugh Dickins
2007-04-04 14:55                   ` Rik van Riel
2007-04-04 14:55                     ` Rik van Riel
2007-04-04 15:25                     ` Hugh Dickins
2007-04-04 15:25                       ` Hugh Dickins
2007-04-05  1:44                       ` Nick Piggin
2007-04-05  1:44                         ` Nick Piggin
2007-04-04 18:04                   ` Andrew Morton
2007-04-04 18:04                     ` Andrew Morton
2007-04-04 18:08                     ` Rik van Riel
2007-04-04 18:08                       ` Rik van Riel
2007-04-04 20:56                       ` Andrew Morton
2007-04-04 20:56                         ` Andrew Morton
2007-04-04 18:39                     ` Hugh Dickins
2007-04-04 18:39                       ` Hugh Dickins
2007-04-03 23:44               ` Andrew Morton
2007-04-03 23:44                 ` Andrew Morton
2007-04-04 13:09             ` William Lee Irwin III
2007-04-04 13:38               ` William Lee Irwin III
2007-04-04 13:38                 ` William Lee Irwin III
2007-04-04 18:51               ` Andrew Morton
2007-04-04 18:51                 ` Andrew Morton
2007-04-05  4:14                 ` William Lee Irwin III
2007-04-05  4:14                   ` William Lee Irwin III
2007-04-04 23:00             ` preemption and rwsems (was: Re: missing madvise functionality) Andrew Morton
2007-04-04 23:00               ` Andrew Morton
2007-04-05 12:48               ` David Howells
2007-04-05 12:48                 ` David Howells
2007-04-05 19:11                 ` Ingo Molnar
2007-04-05 19:11                   ` Ingo Molnar
2007-04-05 20:37                   ` Andrew Morton
2007-04-05 20:37                     ` Andrew Morton
2007-04-06  9:08                     ` Ingo Molnar
2007-04-06 19:30                       ` Andrew Morton
2007-04-06 19:30                         ` Andrew Morton
2007-04-06 19:40                         ` Ingo Molnar
2007-04-06 19:40                           ` Ingo Molnar
2007-04-05 19:27                 ` Andrew Morton
2007-04-05 19:27                   ` Andrew Morton
2007-04-05  7:31             ` missing madvise functionality Rik van Riel
2007-04-05  7:39               ` Rik van Riel
2007-04-05  7:39                 ` Rik van Riel
2007-04-05  8:32                 ` Andrew Morton
2007-04-05  8:32                   ` Andrew Morton
2007-04-05 15:47                   ` Rik van Riel
2007-04-05 15:47                     ` Rik van Riel
2007-04-05  8:08               ` Eric Dumazet
2007-04-05  8:08                 ` Eric Dumazet
2007-04-05  8:31                 ` Rik van Riel
2007-04-05  8:31                   ` Rik van Riel
2007-04-05  9:06                   ` Eric Dumazet
2007-04-05  9:06                     ` Eric Dumazet
2007-04-05  9:45               ` Jakub Jelinek
2007-04-05  9:45                 ` Jakub Jelinek
2007-04-05 16:15                 ` Rik van Riel
2007-04-05 16:15                   ` Rik van Riel
2007-04-05 16:10               ` Ulrich Drepper
2007-04-06  2:28                 ` Nick Piggin
2007-04-06  2:28                   ` Nick Piggin
2007-04-06  2:52                   ` Ulrich Drepper
2007-04-06  2:59                     ` Nick Piggin
2007-04-06  2:59                       ` Nick Piggin
2007-04-03 20:51           ` Andrew Morton
2007-04-03 20:51             ` Andrew Morton
2007-04-03 20:57             ` Ulrich Drepper
2007-04-03 21:00             ` Rik van Riel
2007-04-03 21:00               ` Rik van Riel
2007-04-03 21:10               ` Eric Dumazet
2007-04-03 21:10                 ` Eric Dumazet
2007-04-03 21:12                 ` Jörn Engel
2007-04-03 21:12                   ` Jörn Engel
2007-04-03 21:15                 ` Rik van Riel
2007-04-03 21:15                   ` Rik van Riel
2007-04-03 21:30                   ` Eric Dumazet
2007-04-03 21:30                     ` Eric Dumazet
2007-04-03 21:22                 ` Jeremy Fitzhardinge
2007-04-03 21:22                   ` Jeremy Fitzhardinge
2007-04-03 21:29                   ` Rik van Riel
2007-04-03 21:29                     ` Rik van Riel
2007-04-03 21:46                 ` Ulrich Drepper
2007-04-03 22:51                   ` Andi Kleen
2007-04-03 22:51                     ` Andi Kleen
2007-04-03 23:07                     ` Ulrich Drepper
2007-04-03 21:16               ` Andrew Morton
2007-04-03 21:16                 ` Andrew Morton
2007-04-04 18:49             ` Anton Blanchard
2007-04-04 18:49               ` Anton Blanchard
2007-04-03 22:07 ` Arnd Bergmann
2007-04-03 22:23   ` Ulrich Drepper
2007-04-04  2:53 ` Marko Macek
2007-04-04  2:56   ` Rik van Riel
2007-04-04 14:37   ` Hugh Dickins
2007-04-04  7:46 ` Nick Piggin
2007-04-04  8:04   ` Nick Piggin [this message]
2007-04-04  8:04     ` Nick Piggin
2007-04-04  8:20   ` Jakub Jelinek
2007-04-04  8:20     ` Jakub Jelinek
2007-04-04  8:47     ` Nick Piggin
2007-04-04  8:47       ` Nick Piggin
2007-04-05  4:23       ` Nick Piggin
2007-04-05 18:38   ` Rik van Riel
2007-04-05 18:38     ` Rik van Riel
2007-04-05 21:07     ` Andrew Morton
2007-04-05 21:07       ` Andrew Morton
2007-04-05 21:39       ` Rik van Riel
2007-04-05 21:39         ` Rik van Riel
2007-04-06  1:28     ` Nick Piggin
2007-04-06  1:28       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46135C0E.5070803@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=drepper@redhat.com \
    --cc=jakub@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.