From: Minchan Kim <minchan@kernel.org>
To: Paul Turner <pjt@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
John Stultz <john.stultz@linaro.org>,
Christoph Lameter <cl@linux.com>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
sanjay@google.com, David Rientjes <rientjes@google.com>
Subject: Re: [RFC v2] Support volatile range for anon vma
Date: Thu, 1 Nov 2012 09:50:52 +0900 [thread overview]
Message-ID: <20121101005052.GB26256@bbox> (raw)
In-Reply-To: <CAPM31RKm89s6PaAnfySUD-f+eGdoZP6=9DHy58tx_4Zi8Z9WPQ@mail.gmail.com>
Hello,
On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote:
> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 30 Oct 2012 10:29:54 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> >
> > > This patch introudces new madvise behavior MADV_VOLATILE and
> > > MADV_NOVOLATILE for anonymous pages. It's different with
> > > John Stultz's version which considers only tmpfs while this patch
> > > considers only anonymous pages so this cannot cover John's one.
> > > If below idea is proved as reasonable, I hope we can unify both
> > > concepts by madvise/fadvise.
> > >
> > > Rationale is following as.
> > > Many allocators call munmap(2) when user call free(3) if ptr is
> > > in mmaped area. But munmap isn't cheap because it have to clean up
> > > all pte entries and unlinking a vma so overhead would be increased
> > > linearly by mmaped area's size.
> >
> > Presumably the userspace allocator will internally manage memory in
> > large chunks, so the munmap() call frequency will be much lower than
> > the free() call frequency. So the performance gains from this change
> > might be very small.
>
> I don't think I strictly understand the motivation from a
> malloc-standpoint here.
>
> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want
> to perform discards on Linux. For any reasonable allocator (short
> of binding malloc --> mmap, free --> unmap) this seems a better
> choice.
>
> Note also from a performance stand-point I doubt any allocator (which
> case about performance) is going to want to pay the cost of even a
> null syscall about typical malloc/free usage (consider: a tcmalloc
Good point.
> malloc/free pairis currently <20ns). Given then that this cost is
> amortized once you start doing discards on larger blocks MADV_DONTNEED
> seems a preferable interface:
> - You don't need to reconstruct an arena when you do want to allocate
> since there's no munmap/mmap for the region to change about
> - There are no syscalls involved in later reallocating the block.
Above benefits are applied on MADV_VOLATILE, too.
But as you pointed out, there is a little bit overhead than DONTNEED
because allocator should call madvise(MADV_NOVOLATILE) before allocation.
For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem
and could be a problem on parallel malloc/free workload as KOSAKI pointed out.
In such case, we can change semantic so malloc doesn't need to call
madivse(NOVOLATILE) before allocating. Then, page fault handler have to
check whether this page fault happen by access of volatile vma. If so,
it could return zero page instead of SIGBUS and mark the vma isn't volatile
any more.
>
> The only real additional cost is address-space. Are you strongly
> concerned about the 32-bit case?
No. I believe allocators have a logic to clean up them once address space is
almost full.
Thanks, Paul.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-11-01 0:44 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-30 1:29 [RFC v2] Support volatile range for anon vma Minchan Kim
2012-10-31 21:35 ` Andrew Morton
2012-10-31 21:59 ` Paul Turner
2012-10-31 22:56 ` KOSAKI Motohiro
2012-11-01 1:15 ` Paul Turner
2012-11-01 1:46 ` Minchan Kim
2012-11-01 1:25 ` Minchan Kim
2012-11-01 2:01 ` KOSAKI Motohiro
2012-11-05 23:54 ` Arun Sharma
2012-11-06 1:49 ` Minchan Kim
2012-11-06 2:03 ` Arun Sharma
2012-11-01 0:50 ` Minchan Kim [this message]
2012-11-01 1:22 ` Paul Turner
2012-11-01 1:33 ` Minchan Kim
2012-11-01 0:21 ` Minchan Kim
2012-11-02 1:43 ` Bob Liu
2012-11-02 2:37 ` Minchan Kim
2012-11-22 0:36 ` John Stultz
2012-11-29 4:18 ` John Stultz
2012-12-04 0:00 ` Minchan Kim
2012-12-04 0:57 ` John Stultz
2012-12-04 7:22 ` Minchan Kim
2012-12-04 19:13 ` John Stultz
2012-12-05 4:18 ` Minchan Kim
2012-12-08 0:49 ` John Stultz
2012-12-11 4:40 ` Minchan Kim
2012-12-05 7:01 ` Minchan Kim
2012-12-08 0:20 ` John Stultz
2012-12-11 4:34 ` Minchan Kim
2012-12-03 23:50 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121101005052.GB26256@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=hughd@google.com \
--cc=john.stultz@linaro.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kernel-team@android.com \
--cc=kosaki.motohiro@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mh@glandium.org \
--cc=neilb@suse.de \
--cc=pjt@google.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=rlove@google.com \
--cc=sanjay@google.com \
--cc=tglek@mozilla.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).