From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-api@vger.kernel.org, Tim Murray <timmurray@google.com>,
Joel Fernandes <joel@joelfernandes.org>,
Suren Baghdasaryan <surenb@google.com>,
Daniel Colascione <dancol@google.com>,
Shakeel Butt <shakeelb@google.com>,
Sonny Rao <sonnyrao@google.com>,
Brian Geffon <bgeffon@google.com>,
jannh@google.com, oleg@redhat.com, christian@brauner.io,
oleksandr@redhat.com, hdanton@sina.com
Subject: Re: [RFCv2 1/6] mm: introduce MADV_COLD
Date: Mon, 3 Jun 2019 13:27:17 -0400 [thread overview]
Message-ID: <20190603172717.GA30363@cmpxchg.org> (raw)
In-Reply-To: <20190603071607.GB4531@dhcp22.suse.cz>
On Mon, Jun 03, 2019 at 09:16:07AM +0200, Michal Hocko wrote:
> On Fri 31-05-19 23:34:07, Minchan Kim wrote:
> > On Fri, May 31, 2019 at 04:03:32PM +0200, Michal Hocko wrote:
> > > On Fri 31-05-19 22:39:04, Minchan Kim wrote:
> > > > On Fri, May 31, 2019 at 10:47:52AM +0200, Michal Hocko wrote:
> > > > > On Fri 31-05-19 15:43:08, Minchan Kim wrote:
> > > > > > When a process expects no accesses to a certain memory range, it could
> > > > > > give a hint to kernel that the pages can be reclaimed when memory pressure
> > > > > > happens but data should be preserved for future use. This could reduce
> > > > > > workingset eviction so it ends up increasing performance.
> > > > > >
> > > > > > This patch introduces the new MADV_COLD hint to madvise(2) syscall.
> > > > > > MADV_COLD can be used by a process to mark a memory range as not expected
> > > > > > to be used in the near future. The hint can help kernel in deciding which
> > > > > > pages to evict early during memory pressure.
> > > > > >
> > > > > > Internally, it works via deactivating pages from active list to inactive's
> > > > > > head if the page is private because inactive list could be full of
> > > > > > used-once pages which are first candidate for the reclaiming and that's a
> > > > > > reason why MADV_FREE move pages to head of inactive LRU list. Therefore,
> > > > > > if the memory pressure happens, they will be reclaimed earlier than other
> > > > > > active pages unless there is no access until the time.
> > > > >
> > > > > [I am intentionally not looking at the implementation because below
> > > > > points should be clear from the changelog - sorry about nagging ;)]
> > > > >
> > > > > What kind of pages can be deactivated? Anonymous/File backed.
> > > > > Private/shared? If shared, are there any restrictions?
> > > >
> > > > Both file and private pages could be deactived from each active LRU
> > > > to each inactive LRU if the page has one map_count. In other words,
> > > >
> > > > if (page_mapcount(page) <= 1)
> > > > deactivate_page(page);
> > >
> > > Why do we restrict to pages that are single mapped?
> >
> > Because page table in one of process shared the page would have access bit
> > so finally we couldn't reclaim the page. The more process it is shared,
> > the more fail to reclaim.
>
> So what? In other words why should it be restricted solely based on the
> map count. I can see a reason to restrict based on the access
> permissions because we do not want to simplify all sorts of side channel
> attacks but memory reclaim is capable of reclaiming shared pages and so
> far I haven't heard any sound argument why madvise should skip those.
> Again if there are any reasons, then document them in the changelog.
I think it makes sense. It could be explained, but it also follows
established madvise semantics, and I'm not sure it's necessarily
Minchan's job to re-iterate those.
Sharing isn't exactly transparent to userspace. The kernel does COW,
ksm etc. When you madvise, you can really only speak for your own
reference to that memory - "*I* am not using this."
This is in line with other madvise calls: MADV_DONTNEED clears the
local page table entries and drops the corresponding references, so
shared pages won't get freed. MADV_FREE clears the pte dirty bit and
also has explicit mapcount checks before clearing PG_dirty, so again
shared pages don't get freed.
next prev parent reply other threads:[~2019-06-03 17:27 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-31 6:43 [RFCv2 0/6] introduce memory hinting API for external process Minchan Kim
2019-05-31 6:43 ` [RFCv2 1/6] mm: introduce MADV_COLD Minchan Kim
2019-05-31 8:47 ` Michal Hocko
2019-05-31 13:39 ` Minchan Kim
2019-05-31 14:03 ` Michal Hocko
2019-05-31 14:34 ` Minchan Kim
2019-06-03 7:16 ` Michal Hocko
2019-06-03 15:43 ` Daniel Colascione
2019-06-03 17:27 ` Johannes Weiner [this message]
2019-06-03 20:32 ` Michal Hocko
2019-06-03 21:50 ` Johannes Weiner
2019-06-03 23:02 ` Minchan Kim
2019-06-04 6:56 ` Michal Hocko
2019-06-04 12:06 ` Johannes Weiner
2019-06-04 6:55 ` Michal Hocko
2019-06-04 4:26 ` Minchan Kim
2019-06-04 7:02 ` Michal Hocko
2019-05-31 6:43 ` [RFCv2 2/6] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-05-31 6:43 ` [RFCv2 3/6] mm: introduce MADV_PAGEOUT Minchan Kim
2019-05-31 8:50 ` Michal Hocko
2019-05-31 13:44 ` Minchan Kim
2019-05-31 16:59 ` Johannes Weiner
2019-05-31 23:14 ` Minchan Kim
2019-05-31 6:43 ` [RFCv2 4/6] mm: factor out madvise's core functionality Minchan Kim
2019-05-31 7:04 ` Oleksandr Natalenko
2019-05-31 13:12 ` Minchan Kim
2019-05-31 14:35 ` Oleksandr Natalenko
2019-05-31 23:29 ` Minchan Kim
2019-06-05 13:27 ` Oleksandr Natalenko
2019-06-10 10:12 ` Minchan Kim
2019-05-31 6:43 ` [RFCv2 5/6] mm: introduce external memory hinting API Minchan Kim
2019-05-31 8:37 ` Michal Hocko
2019-05-31 13:19 ` Minchan Kim
2019-05-31 14:00 ` Michal Hocko
2019-05-31 14:11 ` Minchan Kim
2019-05-31 17:35 ` Daniel Colascione
2019-05-31 6:43 ` [RFCv2 6/6] mm: extend process_madvise syscall to support vector arrary Minchan Kim
2019-05-31 10:06 ` Yann Droneaud
2019-05-31 23:18 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190603172717.GA30363@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=bgeffon@google.com \
--cc=christian@brauner.io \
--cc=dancol@google.com \
--cc=hdanton@sina.com \
--cc=jannh@google.com \
--cc=joel@joelfernandes.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=oleg@redhat.com \
--cc=oleksandr@redhat.com \
--cc=shakeelb@google.com \
--cc=sonnyrao@google.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.