linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Micay <danielmicay@gmail.com>
To: Shaohua Li <shli@kernel.org>, Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	linux-api@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Jason Evans <je@fb.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Michal Hocko <mhocko@suse.cz>,
	yalin.wang2010@gmail.com, bmaurer@fb.com
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Wed, 4 Nov 2015 16:16:01 -0500	[thread overview]
Message-ID: <563A7591.7080607@gmail.com> (raw)
In-Reply-To: <20151104200006.GA46783@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

> Compared to MADV_DONTNEED, MADV_FREE's lazy memory free is a huge win to reduce
> page fault. But there is one issue remaining, the TLB flush. Both MADV_DONTNEED
> and MADV_FREE do TLB flush. TLB flush overhead is quite big in contemporary
> multi-thread applications. In our production workload, we observed 80% CPU
> spending on TLB flush triggered by jemalloc madvise(MADV_DONTNEED) sometimes.
> We haven't tested MADV_FREE yet, but the result should be similar. It's hard to
> avoid the TLB flush issue with MADV_FREE, because it helps avoid data
> corruption.
> 
> The new proposal tries to fix the TLB issue. We introduce two madvise verbs:
> 
> MARK_FREE. Userspace notifies kernel the memory range can be discarded. Kernel
> just records the range in current stage. Should memory pressure happen, page
> reclaim can free the memory directly regardless the pte state.
> 
> MARK_NOFREE. Userspace notifies kernel the memory range will be reused soon.
> Kernel deletes the record and prevents page reclaim discards the memory. If the
> memory isn't reclaimed, userspace will access the old memory, otherwise do
> normal page fault handling.
> 
> The point is to let userspace notify kernel if memory can be discarded, instead
> of depending on pte dirty bit used by MADV_FREE. With these, no TLB flush is
> required till page reclaim actually frees the memory (page reclaim need do the
> TLB flush for MADV_FREE too). It still preserves the lazy memory free merit of
> MADV_FREE.
> 
> Compared to MADV_FREE, reusing memory with the new proposal isn't transparent,
> eg must call MARK_NOFREE. But it's easy to utilize the new API in jemalloc.
> 
> We don't have code to backup this yet, sorry. We'd like to discuss it if it
> makes sense.

That's comparable to Android's pinning / unpinning API for ashmem and I
think it makes sense if it's faster. It's different than the MADV_FREE
API though, because the new allocations that are handed out won't have
the usual lazy commit which MADV_FREE provides. Pages in an allocation
that's handed out can still be dropped until they are actually written
to. It's considered active by jemalloc either way, but only a subset of
the active pages are actually committed. There's probably a use case for
both of these systems.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2015-11-04 21:16 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04  1:25 [PATCH v2 00/13] MADV_FREE support Minchan Kim
2015-11-04  1:25 ` [PATCH v2 01/13] mm: support madvise(MADV_FREE) Minchan Kim
2015-11-04  2:16   ` Sergey Senozhatsky
2015-11-04 23:39     ` Minchan Kim
2015-11-05  3:41       ` Sergey Senozhatsky
2015-11-04  2:29   ` Sergey Senozhatsky
2015-11-04 23:40     ` Minchan Kim
2015-11-04  3:41   ` Andy Lutomirski
2015-11-04  5:50     ` Daniel Micay
2015-11-04  5:53       ` Daniel Micay
2015-11-04  6:04         ` Daniel Micay
2015-11-04 18:23       ` Andy Lutomirski
2015-11-04 22:05         ` Daniel Micay
2015-11-05 18:17           ` Shaohua Li
2015-11-05 20:13             ` Daniel Micay
2015-11-05 20:14               ` Daniel Micay
2015-11-05  0:13     ` Minchan Kim
2015-11-05  0:42       ` Andy Lutomirski
2015-11-05  0:56         ` Minchan Kim
2015-11-05  1:29           ` Andy Lutomirski
2015-11-05  1:48             ` Minchan Kim
2015-11-04 20:00   ` Shaohua Li
2015-11-04 21:16     ` Daniel Micay [this message]
2015-11-04 21:29       ` Daniel Micay
2015-11-04 21:43     ` Andy Lutomirski
2015-11-05  1:33     ` Minchan Kim
2015-11-05  1:37       ` Minchan Kim
2015-12-01 22:30     ` John Stultz
2015-11-04  1:25 ` [PATCH v2 02/13] mm: define MADV_FREE for some arches Minchan Kim
2015-11-04  1:25 ` [PATCH v2 03/13] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
2015-11-04  1:25 ` [PATCH v2 04/13] mm: free swp_entry in madvise_free Minchan Kim
2015-11-04  1:25 ` [PATCH v2 05/13] mm: move lazily freed pages to inactive list Minchan Kim
2015-11-04  1:26 ` [PATCH v2 06/13] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-11-04  1:26 ` [PATCH v2 07/13] mm: mark stable page dirty in KSM Minchan Kim
2015-11-04  1:26 ` [PATCH v2 08/13] x86: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04  1:26 ` [PATCH v2 09/13] sparc: " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 10/13] powerpc: " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 11/13] arm: add pmd_mkclean " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 12/13] arm64: " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 13/13] mm: don't split THP page when syscall is called Minchan Kim
2015-12-05 11:10 ` [PATCH v2 00/13] MADV_FREE support Pavel Machek
2015-12-05 15:51   ` Daniel Micay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563A7591.7080607@gmail.com \
    --to=danielmicay@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bmaurer@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=je@fb.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=yalin.wang2010@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).