linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Minchan Kim <minchan@kernel.org>
Subject: [RFC 0/6] MADV_FREE: respect pte_dirty, not PG_dirty.
Date: Wed,  3 Jun 2015 15:15:39 +0900	[thread overview]
Message-ID: <1433312145-19386-1-git-send-email-minchan@kernel.org> (raw)

MADV_FREE relies on the dirty bit in page table entry to decide
whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't
discard the page.

However, as one of exmaple, if swap-in by read fault happens,
page table entry point out the page doesn't have marked dirty bit
so MADV_FREE might discard the page wrongly.

For avoiding the problem, MADV_FREE did more checks with PageDirty
and PageSwapCache. It worked out because swapped-in page lives
on swap cache and since it was evicted from the swap cache,
the page has PG_dirty flag. So both page flags checks effectively
prevent wrong discarding by MADV_FREE.

A problem in above logic is that swapped-in page has PG_dirty
since they are removed from swap cache so VM cannot consider
those pages as freeable any more alghouth madvise_free is
called in future. Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; -> a page swapped-in and removed from swapcache.
               page table doesn't mark dirty bit and page
               descriptor includes PG_dirty
..
..
madvise_free(ptr);
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

Rather than relying on the PG_dirty of page descriptor for
preventing discarding a page, dirty bit in page table is more
straightforward and simple.

So, this patch try to make page table entry's dirty bit mark so
it doesn't need to take care of PG_dirty.
For it, it fixes several cases(e.g, KSM, migration, swapin, swapoff)
then, finally it makes MADV_FREE simple.

With this, it removes complicated logic and makes freeable page
checking by madvise_free simple.(ie, +90/-108).
Of course, we could solve above mentioned PG_Dirty problem.

I tested this patchset(memcg, tmpfs, swapon/off, THP, KSM) and
found no problem but it still needs careful review.

Minchan Kim (6):
  mm: keep dirty bit on KSM page
  mm: keep dirty bit on anonymous page migration
  mm: mark dirty bit on swapped-in page
  mm: mark dirty bit on unuse_pte
  mm: decouple PG_dirty from MADV_FREE
  mm: MADV_FREE refactoring

 include/linux/rmap.h |  9 ++----
 mm/ksm.c             | 19 ++++++++++---
 mm/madvise.c         | 13 ---------
 mm/memory.c          |  6 ++--
 mm/migrate.c         |  4 +++
 mm/rmap.c            | 78 +++++++++++++++++++++++++---------------------------
 mm/swapfile.c        |  6 +++-
 mm/vmscan.c          | 63 ++++++++++++++----------------------------
 8 files changed, 90 insertions(+), 108 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2015-06-03  6:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-03  6:15 Minchan Kim [this message]
2015-06-03  6:15 ` [RFC 1/6] mm: keep dirty bit on KSM page Minchan Kim
2015-06-03  6:15 ` [RFC 2/6] mm: keep dirty bit on anonymous page migration Minchan Kim
2015-06-03  6:15 ` [RFC 3/6] mm: mark dirty bit on swapped-in page Minchan Kim
2015-06-09 19:07   ` Cyrill Gorcunov
2015-06-09 23:52     ` Minchan Kim
2015-06-10  7:23       ` Cyrill Gorcunov
2015-06-10  8:00         ` Minchan Kim
2015-06-10  8:05           ` Cyrill Gorcunov
2015-06-03  6:15 ` [RFC 4/6] mm: mark dirty bit on unuse_pte Minchan Kim
2015-06-03  6:15 ` [RFC 5/6] mm: decouple PG_dirty from MADV_FREE Minchan Kim
2015-06-03  6:15 ` [RFC 6/6] mm: MADV_FREE refactoring Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1433312145-19386-1-git-send-email-minchan@kernel.org \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).