From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Shaohua Li <shli@kernel.org>,
Yalin.Wang@sonymobile.com, Hugh Dickins <hughd@google.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Fri, 10 Apr 2015 09:08:12 +0900 [thread overview]
Message-ID: <20150410000759.GA30287@blaptop> (raw)
In-Reply-To: <20150409135939.bbc9025d925de9d0fdd12797@linux-foundation.org>
Hello Andrew,
On Thu, Apr 09, 2015 at 01:59:39PM -0700, Andrew Morton wrote:
> On Thu, 9 Apr 2015 08:50:25 +0900 Minchan Kim <minchan@kernel.org> wrote:
>
> > Bump.
>
> I'm getting the feeling that MADV_FREE is out of control.
>
> Below is the overall rollup of
>
> mm-support-madvisemadv_free.patch
> mm-support-madvisemadv_free-fix.patch
> mm-support-madvisemadv_free-fix-2.patch
> mm-dont-split-thp-page-when-syscall-is-called.patch
> mm-dont-split-thp-page-when-syscall-is-called-fix.patch
> mm-dont-split-thp-page-when-syscall-is-called-fix-2.patch
> mm-free-swp_entry-in-madvise_free.patch
> mm-move-lazy-free-pages-to-inactive-list.patch
> mm-move-lazy-free-pages-to-inactive-list-fix.patch
> mm-move-lazy-free-pages-to-inactive-list-fix-fix.patch
> mm-move-lazy-free-pages-to-inactive-list-fix-fix-fix.patch
> mm-make-every-pte-dirty-on-do_swap_page.patch
>
>
> It's pretty large and has its sticky little paws in all sorts of places.
>
>
> The feature would need to be pretty darn useful to justify a mainline
> merge. Has any such usefulness been demonstrated?
Jemalloc has used MADV_FREE instead of MADV_DONTNEED for a long time
in MADV_FREE supporting OSes(FreeBSD, Solaris, Darwin, Windows).
It used MADV_DONTNEED on only Linux because there was no the feature.
========================== &< ===========================
jemalloc:
/*
* Methods for purging unused pages differ between operating systems.
*
* madvise(..., MADV_DONTNEED) : On Linux, this immediately discards pages,
* such that new pages will be demand-zeroed if
* the address region is later touched.
* madvise(..., MADV_FREE) : On FreeBSD and Darwin, this marks pages as being
* unused, such that they will be discarded rather
* than swapped out.
*/
...
bool
pages_purge(void *addr, size_t length)
{
bool unzeroed;
#ifdef _WIN32
VirtualAlloc(addr, length, MEM_RESET, PAGE_READWRITE);
unzeroed = true;
#elif defined(JEMALLOC_HAVE_MADVISE)
# ifdef JEMALLOC_PURGE_MADVISE_DONTNEED
# define JEMALLOC_MADV_PURGE MADV_DONTNEED
# define JEMALLOC_MADV_ZEROS true
# elif defined(JEMALLOC_PURGE_MADVISE_FREE)
# define JEMALLOC_MADV_PURGE MADV_FREE
# define JEMALLOC_MADV_ZEROS false
# else
# error "No madvise(2) flag defined for purging unused dirty pages."
# endif
int err = madvise(addr, length, JEMALLOC_MADV_PURGE);
unzeroed = (!JEMALLOC_MADV_ZEROS || err != 0);
# undef JEMALLOC_MADV_PURGE
# undef JEMALLOC_MADV_ZEROS
#else
/* Last resort no-op. */
unzeroed = true;
#endif
return (unzeroed);
}
Tcmalloc is same page.
========================== &< ===========================
// MADV_FREE is specifically designed for use by malloc(), but only
// FreeBSD supports it; in linux we fall back to the somewhat inferior
// MADV_DONTNEED.
#if !defined(MADV_FREE) && defined(MADV_DONTNEED)
# define MADV_FREE MADV_DONTNEED
#endif
..
bool TCMalloc_SystemRelease(void* start, size_t length) {
#ifdef MADV_FREE
if (FLAGS_malloc_devmem_start) {
// It's not safe to use MADV_FREE/MADV_DONTNEED if we've been
// mapping /dev/mem for heap memory.
return false;
}
if (FLAGS_malloc_disable_memory_release) return false;
if (pagesize == 0) pagesize = getpagesize();
const size_t pagemask = pagesize - 1;
size_t new_start = reinterpret_cast<size_t>(start);
size_t end = new_start + length;
size_t new_end = end;
// Round up the starting address and round down the ending address
// to be page aligned:
new_start = (new_start + pagesize - 1) & ~pagemask;
new_end = new_end & ~pagemask;
ASSERT((new_start & pagemask) == 0);
ASSERT((new_end & pagemask) == 0);
ASSERT(new_start >= reinterpret_cast<size_t>(start));
ASSERT(new_end <= end);
if (new_end > new_start) {
int result;
do {
result = madvise(reinterpret_cast<char*>(new_start),
new_end - new_start, MADV_FREE);
} while (result == -1 && errno == EAGAIN);
return result != -1;
}
#endif
return false;
}
glibc want it, too.
https://sourceware.org/ml/libc-alpha/2015-02/msg00197.html
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Shaohua Li <shli@kernel.org>,
Yalin.Wang@sonymobile.com, Hugh Dickins <hughd@google.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Fri, 10 Apr 2015 09:08:12 +0900 [thread overview]
Message-ID: <20150410000759.GA30287@blaptop> (raw)
In-Reply-To: <20150409135939.bbc9025d925de9d0fdd12797@linux-foundation.org>
Hello Andrew,
On Thu, Apr 09, 2015 at 01:59:39PM -0700, Andrew Morton wrote:
> On Thu, 9 Apr 2015 08:50:25 +0900 Minchan Kim <minchan@kernel.org> wrote:
>
> > Bump.
>
> I'm getting the feeling that MADV_FREE is out of control.
>
> Below is the overall rollup of
>
> mm-support-madvisemadv_free.patch
> mm-support-madvisemadv_free-fix.patch
> mm-support-madvisemadv_free-fix-2.patch
> mm-dont-split-thp-page-when-syscall-is-called.patch
> mm-dont-split-thp-page-when-syscall-is-called-fix.patch
> mm-dont-split-thp-page-when-syscall-is-called-fix-2.patch
> mm-free-swp_entry-in-madvise_free.patch
> mm-move-lazy-free-pages-to-inactive-list.patch
> mm-move-lazy-free-pages-to-inactive-list-fix.patch
> mm-move-lazy-free-pages-to-inactive-list-fix-fix.patch
> mm-move-lazy-free-pages-to-inactive-list-fix-fix-fix.patch
> mm-make-every-pte-dirty-on-do_swap_page.patch
>
>
> It's pretty large and has its sticky little paws in all sorts of places.
>
>
> The feature would need to be pretty darn useful to justify a mainline
> merge. Has any such usefulness been demonstrated?
Jemalloc has used MADV_FREE instead of MADV_DONTNEED for a long time
in MADV_FREE supporting OSes(FreeBSD, Solaris, Darwin, Windows).
It used MADV_DONTNEED on only Linux because there was no the feature.
========================== &< ===========================
jemalloc:
/*
* Methods for purging unused pages differ between operating systems.
*
* madvise(..., MADV_DONTNEED) : On Linux, this immediately discards pages,
* such that new pages will be demand-zeroed if
* the address region is later touched.
* madvise(..., MADV_FREE) : On FreeBSD and Darwin, this marks pages as being
* unused, such that they will be discarded rather
* than swapped out.
*/
...
bool
pages_purge(void *addr, size_t length)
{
bool unzeroed;
#ifdef _WIN32
VirtualAlloc(addr, length, MEM_RESET, PAGE_READWRITE);
unzeroed = true;
#elif defined(JEMALLOC_HAVE_MADVISE)
# ifdef JEMALLOC_PURGE_MADVISE_DONTNEED
# define JEMALLOC_MADV_PURGE MADV_DONTNEED
# define JEMALLOC_MADV_ZEROS true
# elif defined(JEMALLOC_PURGE_MADVISE_FREE)
# define JEMALLOC_MADV_PURGE MADV_FREE
# define JEMALLOC_MADV_ZEROS false
# else
# error "No madvise(2) flag defined for purging unused dirty pages."
# endif
int err = madvise(addr, length, JEMALLOC_MADV_PURGE);
unzeroed = (!JEMALLOC_MADV_ZEROS || err != 0);
# undef JEMALLOC_MADV_PURGE
# undef JEMALLOC_MADV_ZEROS
#else
/* Last resort no-op. */
unzeroed = true;
#endif
return (unzeroed);
}
Tcmalloc is same page.
========================== &< ===========================
// MADV_FREE is specifically designed for use by malloc(), but only
// FreeBSD supports it; in linux we fall back to the somewhat inferior
// MADV_DONTNEED.
#if !defined(MADV_FREE) && defined(MADV_DONTNEED)
# define MADV_FREE MADV_DONTNEED
#endif
..
bool TCMalloc_SystemRelease(void* start, size_t length) {
#ifdef MADV_FREE
if (FLAGS_malloc_devmem_start) {
// It's not safe to use MADV_FREE/MADV_DONTNEED if we've been
// mapping /dev/mem for heap memory.
return false;
}
if (FLAGS_malloc_disable_memory_release) return false;
if (pagesize == 0) pagesize = getpagesize();
const size_t pagemask = pagesize - 1;
size_t new_start = reinterpret_cast<size_t>(start);
size_t end = new_start + length;
size_t new_end = end;
// Round up the starting address and round down the ending address
// to be page aligned:
new_start = (new_start + pagesize - 1) & ~pagemask;
new_end = new_end & ~pagemask;
ASSERT((new_start & pagemask) == 0);
ASSERT((new_end & pagemask) == 0);
ASSERT(new_start >= reinterpret_cast<size_t>(start));
ASSERT(new_end <= end);
if (new_end > new_start) {
int result;
do {
result = madvise(reinterpret_cast<char*>(new_start),
new_end - new_start, MADV_FREE);
} while (result == -1 && errno == EAGAIN);
return result != -1;
}
#endif
return false;
}
glibc want it, too.
https://sourceware.org/ml/libc-alpha/2015-02/msg00197.html
--
Kind regards,
Minchan Kim
next prev parent reply other threads:[~2015-04-10 0:08 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-11 1:20 ` Minchan Kim
2015-03-11 1:20 ` [PATCH 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-03-11 1:20 ` Minchan Kim
2015-03-11 1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-03-11 1:20 ` Minchan Kim
2015-03-11 2:14 ` Wang, Yalin
2015-03-11 2:14 ` Wang, Yalin
2015-03-11 4:30 ` Minchan Kim
2015-03-11 4:30 ` Minchan Kim
2015-04-01 20:38 ` Rik van Riel
2015-04-01 20:38 ` Rik van Riel
2015-03-11 9:05 ` [RFC ] mm: don't ignore file map pages for madvise_free( ) Wang, Yalin
2015-03-11 9:05 ` Wang, Yalin
2015-03-11 9:47 ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
2015-03-11 9:47 ` Wang, Yalin
2015-03-20 22:43 ` [PATCH 3/4] mm: move lazy free pages to inactive list Andrew Morton
2015-03-20 22:43 ` Andrew Morton
2015-03-30 5:35 ` Minchan Kim
2015-03-30 5:35 ` Minchan Kim
2015-03-30 21:20 ` Andrew Morton
2015-03-30 21:20 ` Andrew Morton
2015-03-31 4:45 ` Minchan Kim
2015-03-31 4:45 ` Minchan Kim
2015-03-31 5:28 ` Andrew Morton
2015-03-31 5:28 ` Andrew Morton
2015-03-31 5:57 ` Minchan Kim
2015-03-31 5:57 ` Minchan Kim
2015-03-11 1:20 ` [PATCH 4/4] mm: make every pte dirty on do_swap_page Minchan Kim
2015-03-11 1:20 ` Minchan Kim
2015-03-30 5:22 ` Minchan Kim
2015-03-30 5:22 ` Minchan Kim
2015-03-30 8:51 ` Cyrill Gorcunov
2015-03-30 8:51 ` Cyrill Gorcunov
2015-03-30 8:59 ` Minchan Kim
2015-03-30 8:59 ` Minchan Kim
2015-03-30 21:14 ` Cyrill Gorcunov
2015-03-30 21:14 ` Cyrill Gorcunov
2015-03-31 4:38 ` Minchan Kim
2015-03-31 4:38 ` Minchan Kim
2015-04-08 23:50 ` Minchan Kim
2015-04-08 23:50 ` Minchan Kim
2015-04-09 20:59 ` Andrew Morton
2015-04-09 20:59 ` Andrew Morton
2015-04-10 0:08 ` Minchan Kim [this message]
2015-04-10 0:08 ` Minchan Kim
2015-04-10 0:14 ` Rik van Riel
2015-04-10 0:14 ` Rik van Riel
2015-04-11 21:40 ` Hugh Dickins
2015-04-11 21:40 ` Hugh Dickins
2015-04-12 14:48 ` Minchan Kim
2015-04-12 14:48 ` Minchan Kim
2015-04-15 6:49 ` Minchan Kim
2015-04-15 6:49 ` Minchan Kim
2015-03-19 0:46 ` [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-19 0:46 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150410000759.GA30287@blaptop \
--to=minchan@kernel.org \
--cc=Yalin.Wang@sonymobile.com \
--cc=akpm@linux-foundation.org \
--cc=gorcunov@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.