All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Jason Evans <je@fb.com>
Cc: John Stultz <john.stultz@linaro.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Michel Lespinasse <walken@google.com>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Android Kernel Team <kernel-team@android.com>,
	Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
	Andrea Righi <andrea@betterlinux.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
	Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Rob Clark <robdclark@gmail.com>,
	"pliard@google.com" <pliard@google.com>
Subject: Re: [PATCH v10 00/16] Volatile Ranges v10
Date: Tue, 4 Feb 2014 13:58:21 +0900	[thread overview]
Message-ID: <20140204045821.GE3481@bbox> (raw)
In-Reply-To: <CF1584DE.149CA%je@fb.com>

On Tue, Feb 04, 2014 at 03:08:27AM +0000, Jason Evans wrote:
> On 2/3/14, 5:31 PM, "Minchan Kim" <minchan@kernel.org> wrote:
> >While I discuss with Johannes, I'm biasing to implemnt MADV_FREE for
> >Linux.
> >instead of vrange syscall for allocator.
> >The reason I preferred vrange syscall over MADV_FREE is vrange syscall
> >is almost O(1) so it's really light weight system call although it needs
> >one more syscall to unmark volatility while MADV_FREE is O(#pages) but
> >as Johannes pointed out, these day kernel trends are using huge pages(ex,
> >2M) so I guess the overhead is really big.
> >
> >(Another topic: If application want to use huge pages on Linux,
> >it should mmap the region is aligned to the huge page size but when
> >I read jemalloc source code, it seems not. Do you have any reason?)
> 
> jemalloc uses 4 MiB naturally aligned chunks by default (chunk size can be
> any power of 2 that is at least two pages), so by default jemalloc does
> align its mappings to huge page boundaries.
> 
> However, chunks have embedded metadata headers, which means that in
> practice, only the second half of each chunk can be madvise()d away if
> only huge pages are in use.  Additionally, the overhead of using even one
> huge page per size class would be unacceptable for most applications (2
> MiB * ~30 size classes * number of active arenas), so adjusting the
> allocator's layout algorithms to use huge pages would require a very
> different strategy than is currently used, and the likelihood of having
> huge pages completely drain of allocations would be quite low.  On top of
> that, the implicit nature of transparent huge pages makes them difficult
> to reliably account for in userland.  In other words, huge pages and
> explicit dirty page purging are for most practical purposes incompatible.

I didn't mean we should use huge pages for all of class but just wanted
to align chunk with hugepage size. Thanks for the confirmation.

> 
> >As a bonus point, many allocators already has a logic to use MADV_FREE
> >so it's really easy to use it if Linux start to support it.
> 
> MADV_FREE is certainly an easy interface to use, and as long as there
> aren't any serious scalability issues in the implementation (e.g.
> concurrent madvise() calls for disjoint virtual addresses from multiple
> threads should be contention-free), I think it's perfectly adequate.

Of course, every thread could do madvise(MADV_FREE) in parallel because
VM in Linux doesn't need write-side semaphore but read-side semaphore.
Additionally, page faulting also needs read-side semaphore so
page faulting, madvise(MADV_FREE) in threads could be done in parallel
without any scalability issue if they don't overlap same virtual addresses
within 4M range because they need a page table lock but it's very
unlikely in allocator, IMO.

But it could prevent new chunk allocation which needs write-side semaphore
but chunk allocation is not common so I think it's not a problem, either.
So, you don't need to change anything other than that enable
JEMALLOC_PURGE_MADVISE_FREE for Linux.

> 
> >Do you see other point that light-weight vrange syscall is
> >superior to MADV_FREE of big chunk all at once?
> 
> Other than system call overhead, volatile ranges and MADV_FREE are both
> great for jemalloc's purposes.  MADV_FREE is a bit easier to deal with,
> mainly because volatile ranges are distinct from dirty pages and virtual
> memory coalescing in jemalloc will require some additional work to
> logically treat adjacent volatile/dirty ranges as contiguous, but that's a
> solvable problem.

Okay, I will implement MADV_FREE and report test result if anybody doesn't
have a concern.
Thanks for the feedback!

> 
> Thanks,
> Jason
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-02-04  4:58 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-02  7:12 [PATCH v10 00/16] Volatile Ranges v10 Minchan Kim
2014-01-02  7:12 ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 01/16] vrange: Add vrange support to mm_structs Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 02/16] vrange: Clear volatility on new mmaps Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 03/16] vrange: Add support for volatile ranges on file mappings Minchan Kim
2014-01-02  7:12 ` [PATCH v10 04/16] vrange: Add new vrange(2) system call Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 05/16] vrange: Add basic functions to purge volatile pages Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 06/16] vrange: introduce fake VM_VRANGE flag Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 07/16] vrange: Purge volatile pages when memory is tight Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 08/16] vrange: Send SIGBUS when user try to access purged page Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 09/16] vrange: Add core shrinking logic for swapless system Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 10/16] vrange: Purging vrange-anon pages from shrinker Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 11/16] vrange: support shmem_purge_page Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 12/16] vrange: Support background purging for vrange-file Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 13/16] vrange: Allocate vroot dynamically Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 14/16] vrange: Change purged with hint Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 15/16] vrange: Prevent unnecessary scanning Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-02  7:12 ` [PATCH v10 16/16] vrange: Add vmstat counter about purged page Minchan Kim
2014-01-02  7:12   ` Minchan Kim
2014-01-27 22:23 ` [PATCH v10 00/16] Volatile Ranges v10 KOSAKI Motohiro
2014-01-27 22:23   ` KOSAKI Motohiro
2014-01-27 22:43   ` John Stultz
2014-01-27 22:43     ` John Stultz
2014-01-28  0:12   ` Minchan Kim
2014-01-28  0:12     ` Minchan Kim
2014-01-28  0:42     ` John Stultz
2014-01-28  0:42       ` John Stultz
2014-01-28  1:02       ` Minchan Kim
2014-01-28  1:02         ` Minchan Kim
2014-01-28  1:09       ` Taras Glek
2014-01-28  1:23         ` Minchan Kim
2014-01-28  1:23           ` Minchan Kim
2014-01-29  0:03 ` Johannes Weiner
2014-01-29  0:03   ` Johannes Weiner
2014-01-29  1:43   ` John Stultz
2014-01-29  1:43     ` John Stultz
2014-01-29 18:30     ` Johannes Weiner
2014-01-29 18:30       ` Johannes Weiner
2014-01-31  1:27       ` John Stultz
2014-01-31  1:27         ` John Stultz
2014-01-31  1:44         ` Jason Evans
2014-02-04  1:31           ` Minchan Kim
2014-02-04  3:08             ` Jason Evans
2014-02-04  4:58               ` Minchan Kim [this message]
2014-02-04 15:25                 ` Dave Hansen
2014-01-31  6:15         ` Johannes Weiner
2014-01-31  6:15           ` Johannes Weiner
2014-01-29  5:11   ` Minchan Kim
2014-01-29  5:11     ` Minchan Kim
2014-01-31 16:49     ` Johannes Weiner
2014-01-31 16:49       ` Johannes Weiner
2014-02-03 14:58       ` Jan Kara
2014-02-03 18:36         ` Johannes Weiner
2014-02-03 18:36           ` Johannes Weiner
2014-02-04  1:09       ` Minchan Kim
2014-02-04  1:09         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140204045821.GE3481@bbox \
    --to=minchan@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@betterlinux.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@fromorbit.com \
    --cc=dhaval.giani@gmail.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=je@fb.com \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mgorman@suse.de \
    --cc=mh@glandium.org \
    --cc=neilb@suse.de \
    --cc=pliard@google.com \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    --cc=robdclark@gmail.com \
    --cc=tglek@mozilla.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.