public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Yalin.Wang@sonymobile.com, Hugh Dickins <hughd@google.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Mon, 30 Mar 2015 14:22:50 +0900	[thread overview]
Message-ID: <20150330052250.GA3008@blaptop> (raw)
In-Reply-To: <1426036838-18154-4-git-send-email-minchan@kernel.org>

2nd description trial.

>From ccfc6c79634f6cec69d8fb23b0e863ebfa5b893c Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 30 Mar 2015 13:43:08 +0900
Subject: [PATCH v2] mm: make every pte dirty on do_swap_page

Bascially, MADV_FREE relys on the dirty bit in page table entry
to decide whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't
discard the page.

However, if swap-in by read fault happens, page table entry
point out the page doesn't have marked dirty bit so MADV_FREE
might discard the page wrongly. For avoiding the problem,
MADV_FREE did more checks with PageDirty and PageSwapCache.
It worked out because swapped-in page lives on swap cache
and since it was evicted from the swap cache, the page has
PG_dirty flag. So both page flags checks effectvely prevent
wrong discarding by MADV_FREE.

A problem in above logic is that swapped-in page has PG_dirty
since they are removed from swap cache so VM cannot consider
those pages as freeable any more alghouth madvise_free is
called in future. Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; -> a page swapped-in and removed from swapcache.
               page table doesn't mark dirty bit and page
               descriptor includes PG_dirty
..
..
madvise_free(ptr);
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

Rather than relying on the PG_dirty of page descriptor
for preventing discarding a page, dirty bit in page table is more
straightforward and simple. So, this patch makes page table dirty
bit marked whenever swap-in happens. Inherenty, page table entry
point out swapped-out page had dirty bit so I think it's no prblem.

With this, it removes complicated logic and makes freeable page
checking by madvise_free simple. Of course, we could solve
above mentioned example.

Cc: Hugh Dickins <hughd@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Reported-by: Yalin Wang <yalin.wang@sonymobile.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---

* From v1:
  * Rewrite description - Andrew

 mm/madvise.c |  1 -
 mm/memory.c  | 10 ++++++++--
 mm/rmap.c    |  2 +-
 mm/vmscan.c  |  3 +--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 				continue;
 			}
 
-			ClearPageDirty(page);
 			unlock_page(page);
 		}
 
diff --git a/mm/memory.c b/mm/memory.c
index 6743966..48ff537 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,15 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
-	pte = mk_pte(page, vma->vm_page_prot);
+
+	/*
+	 * The page is swapping in now was dirty before it was swapped out
+	 * so restore the state again(ie, pte_mkdirty) because MADV_FREE
+	 * relies on the dirty bit on page table.
+	 */
+	pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
 	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
-		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = maybe_mkwrite(pte, vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index dad23a4..281e806 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1275,7 +1275,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 		if (flags & TTU_FREE) {
 			VM_BUG_ON_PAGE(PageSwapCache(page), page);
-			if (!dirty && !PageDirty(page)) {
+			if (!dirty) {
 				/* It's a freeable page by MADV_FREE */
 				dec_mm_counter(mm, MM_ANONPAGES);
 				goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dc6cd51..fffebf0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct page *page,
 		return PAGEREF_KEEP;
 	}
 
-	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
-			!PageDirty(page))
+	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
 		*freeable = true;
 
 	/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

-- 
Kind regards,
Minchan Kim

  reply	other threads:[~2015-03-30  5:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-11  1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-11  1:20 ` [PATCH 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-03-11  1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-03-11  2:14   ` Wang, Yalin
2015-03-11  4:30     ` Minchan Kim
2015-04-01 20:38     ` Rik van Riel
2015-03-11  9:05   ` [RFC ] mm: don't ignore file map pages for madvise_free( ) Wang, Yalin
2015-03-11  9:47   ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
2015-03-20 22:43   ` [PATCH 3/4] mm: move lazy free pages to inactive list Andrew Morton
2015-03-30  5:35     ` Minchan Kim
2015-03-30 21:20       ` Andrew Morton
2015-03-31  4:45         ` Minchan Kim
2015-03-31  5:28           ` Andrew Morton
2015-03-31  5:57             ` Minchan Kim
2015-03-11  1:20 ` [PATCH 4/4] mm: make every pte dirty on do_swap_page Minchan Kim
2015-03-30  5:22   ` Minchan Kim [this message]
2015-03-30  8:51     ` Cyrill Gorcunov
2015-03-30  8:59       ` Minchan Kim
2015-03-30 21:14         ` Cyrill Gorcunov
2015-03-31  4:38           ` Minchan Kim
2015-04-08 23:50   ` Minchan Kim
2015-04-09 20:59     ` Andrew Morton
2015-04-10  0:08       ` Minchan Kim
2015-04-10  0:14       ` Rik van Riel
2015-04-11 21:40   ` Hugh Dickins
2015-04-12 14:48     ` Minchan Kim
2015-04-15  6:49       ` Minchan Kim
2015-03-19  0:46 ` [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150330052250.GA3008@blaptop \
    --to=minchan@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=gorcunov@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox