From: Minchan Kim <minchan@kernel.org>
To: Shaohua Li <shli@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	linux-api@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	zhangyanfei@cn.fujitsu.com, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Jason Evans <je@fb.com>, Daniel Micay <danielmicay@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Michal Hocko <mhocko@suse.cz>,
	yalin.wang2010@gmail.com, "Wang,
	Yalin" <Yalin.Wang@sonymobile.com>
Subject: Re: [PATCH 5/8] mm: move lazily freed pages to inactive list
Date: Thu, 5 Nov 2015 10:03:29 +0900	[thread overview]
Message-ID: <20151105010329.GF7357@bbox> (raw)
In-Reply-To: <20151104175342.GA98327@kernel.org>
On Wed, Nov 04, 2015 at 09:53:42AM -0800, Shaohua Li wrote:
> On Tue, Nov 03, 2015 at 09:52:23AM +0900, Minchan Kim wrote:
> > On Fri, Oct 30, 2015 at 10:22:12AM -0700, Shaohua Li wrote:
> > > On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote:
> > > > MADV_FREE is a hint that it's okay to discard pages if there is memory
> > > > pressure and we use reclaimers(ie, kswapd and direct reclaim) to free them
> > > > so there is no value keeping them in the active anonymous LRU so this
> > > > patch moves them to inactive LRU list's head.
> > > > 
> > > > This means that MADV_FREE-ed pages which were living on the inactive list
> > > > are reclaimed first because they are more likely to be cold rather than
> > > > recently active pages.
> > > > 
> > > > An arguable issue for the approach would be whether we should put the page
> > > > to the head or tail of the inactive list.  I chose head because the kernel
> > > > cannot make sure it's really cold or warm for every MADV_FREE usecase but
> > > > at least we know it's not *hot*, so landing of inactive head would be a
> > > > comprimise for various usecases.
> > > > 
> > > > This fixes suboptimal behavior of MADV_FREE when pages living on the
> > > > active list will sit there for a long time even under memory pressure
> > > > while the inactive list is reclaimed heavily.  This basically breaks the
> > > > whole purpose of using MADV_FREE to help the system to free memory which
> > > > is might not be used.
> > > 
> > > My main concern is the policy how we should treat the FREE pages. Moving it to
> > > inactive lru is definitionly a good start, I'm wondering if it's enough. The
> > > MADV_FREE increases memory pressure and cause unnecessary reclaim because of
> > > the lazy memory free. While MADV_FREE is intended to be a better replacement of
> > > MADV_DONTNEED, MADV_DONTNEED doesn't have the memory pressure issue as it free
> > > memory immediately. So I hope the MADV_FREE doesn't have impact on memory
> > > pressure too. I'm thinking of adding an extra lru list and wartermark for this
> > > to make sure FREE pages can be freed before system wide page reclaim. As you
> > > said, this is arguable, but I hope we can discuss about this issue more.
> > 
> > Yes, it's arguble. ;-)
> > 
> > It seems the divergence comes from MADV_FREE is *replacement* of MADV_DONTNEED.
> > But I don't think so. If we could discard MADV_FREEed page *anytime*, I agree
> > but it's not true because the page would be dirty state when VM want to reclaim. 
> 
> There certainly are other usage cases, but even your patch log mainly describes
> the jemalloc usage case, which uses MADV_DONTNEED.
> 
> > I'm also against with your's suggestion which let's discard FREEed page before
> > system wide page reclaim because system would have lots of clean cold page
> > caches or anonymous pages. In such case, reclaiming of them would be better.
> > Yeb, it's really workload-dependent so we might need some heuristic which is
> > normally what we want to avoid.
> > 
> > Having said that, I agree with you we could do better than the deactivation
> > and frankly speaking, I'm thinking of another LRU list(e.g. tentatively named
> > "ezreclaim LRU list"). What I have in mind is to age (anon|file|ez)
> > fairly. IOW, I want to percolate ez-LRU list reclaiming into get_scan_count.
> > When the MADV_FREE is called, we could move hinted pages from anon-LRU to
> > ez-LRU and then If VM find to not be able to discard a page in ez-LRU,
> > it could promote it to acive-anon-LRU which would be very natural aging
> > concept because it mean someone touches the page recenlty.
> > 
> > With that, I don't want to bias one side and don't want to add some knob for
> > tuning the heuristic but let's rely on common fair aging scheme of VM.
> > 
> > Another bonus with new LRU list is we could support MADV_FREE on swapless
> > system.
> > 
> > > 
> > > Or do you want to push this first and address the policy issue later?
> > 
> > I believe adding new LRU list would be controversial(ie, not trivial)
> > for maintainer POV even though code wouldn't be complicated.
> > So, I want to see problems in *real practice*, not any theoritical
> > test program before diving into that.
> > To see such voice of request, we should release the syscall.
> > So, I want to push this first.
> 
> The memory pressure issue isn't just in artificial test. In jemalloc, there is
> a knob (lg_dirty_mult) to control the rate memory should be purged (using
> MADV_DONTNEED). We already had several reports in our production environment
> changing the knob can cause extra memory usage (and swap and so on). If
> jemalloc uses MADV_FREE, jemalloc will not purge any memory, which is equivent
> to disable current MADV_DONTNEED (eg, lg_dirty_mult = -1). I'm sure this will
> cause the similar issue, eg (extram memory usage, swap). That said I don't
> object to push this first, but the memory pressue issue can happen in real
> production, I hope it's not ignored.
Absolutely, I'm not saying I want to ignore the concern.
Adding new LRU would make churning of many part in MM so before that,
let's see the voice from userland and discuss what's the best if it
has trouble.
> 
> Thanks,
> Shaohua
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply	other threads:[~2015-11-05  1:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-30  7:01 [PATCH 0/8] MADV_FREE support Minchan Kim
2015-10-30  7:01 ` [PATCH 1/8] mm: support madvise(MADV_FREE) Minchan Kim
2015-10-30 16:49   ` Shaohua Li
2015-11-03  0:10     ` Minchan Kim
2015-10-30  7:01 ` [PATCH 2/8] mm: define MADV_FREE for some arches Minchan Kim
2015-10-30  7:01 ` [PATCH 3/8] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
2015-11-02  0:08   ` Hugh Dickins
2015-11-03  2:32     ` Minchan Kim
2015-11-03  2:36       ` Minchan Kim
2015-11-03  3:36         ` David Miller
2015-11-03  4:31           ` Minchan Kim
2015-10-30  7:01 ` [PATCH 4/8] mm: free swp_entry in madvise_free Minchan Kim
2015-10-30 12:28   ` Michal Hocko
2015-11-03  0:53     ` Minchan Kim
2015-10-30  7:01 ` [PATCH 5/8] mm: move lazily freed pages to inactive list Minchan Kim
2015-10-30 17:22   ` Shaohua Li
2015-11-03  0:52     ` Minchan Kim
2015-11-04  8:15       ` Michal Hocko
2015-11-04 17:53       ` Shaohua Li
2015-11-04 18:20         ` Shaohua Li
2015-11-05  1:11           ` Minchan Kim
2015-11-05  1:03         ` Minchan Kim [this message]
2015-11-04 20:55   ` Johannes Weiner
2015-11-04 21:48     ` Daniel Micay
2015-11-04 22:55       ` Johannes Weiner
2015-11-04 23:36         ` Daniel Micay
2015-11-04 23:49           ` Daniel Micay
2015-10-30  7:01 ` [PATCH 6/8] mm: lru_deactivate_fn should clear PG_referenced Minchan Kim
2015-10-30 12:47   ` Michal Hocko
2015-11-03  1:10     ` Minchan Kim
2015-11-04  8:22       ` Michal Hocko
2015-10-30  7:01 ` [PATCH 7/8] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-10-30 12:55   ` Michal Hocko
2015-10-30  7:01 ` [PATCH 8/8] mm: mark stable page dirty in KSM Minchan Kim
2015-11-01  4:51 ` [PATCH 0/8] MADV_FREE support David Rientjes
2015-11-01  6:29   ` Daniel Micay
2015-11-03  2:23     ` Minchan Kim
2015-11-04 20:19     ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20151105010329.GF7357@bbox \
    --to=minchan@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=danielmicay@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=je@fb.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mtk.manpages@gmail.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=yalin.wang2010@gmail.com \
    --cc=zhangyanfei@cn.fujitsu.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).