All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Yalin.Wang@sonymobile.com, Hugh Dickins <hughd@google.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Thu, 9 Apr 2015 08:50:25 +0900	[thread overview]
Message-ID: <20150408235012.GA13690@blaptop> (raw)
In-Reply-To: <1426036838-18154-4-git-send-email-minchan@kernel.org>

Bump.

On Wed, Mar 11, 2015 at 10:20:38AM +0900, Minchan Kim wrote:
> Bascially, MADV_FREE relys on the pte dirty to decide whether
> it allows VM to discard the page. However, if there is swap-in,
> pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> PageDirty and PageSwapCache for those pages to not discard it
> because swapped-in page could live on swap cache or PageDirty
> when it is removed from swapcache.
> 
> The problem in here is that anonymous pages can have PageDirty if
> it is removed from swapcache so that VM cannot parse those pages
> as freeable even if we did madvise_free. Look at below example.
> 
> ptr = malloc();
> memset(ptr);
> ..
> heavy memory pressure -> swap-out all of pages
> ..
> out of memory pressure so there are lots of free pages
> ..
> var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
>                but SetPageDirty
> 
> madvise_free(ptr);
> ..
> ..
> heavy memory pressure -> VM cannot discard the page by PageDirty.
> 
> PageDirty for anonymous page aims for avoiding duplicating
> swapping out. In other words, if a page have swapped-in but
> live swapcache(ie, !PageDirty), we could save swapout if the page
> is selected as victim by VM in future because swap device have
> kept previous swapped-out contents of the page.
> 
> So, rather than relying on the PG_dirty for working madvise_free,
> pte_dirty is more straightforward. Inherently, swapped-out page was
> pte_dirty so this patch restores the dirtiness when swap-in fault
> happens so madvise_free doesn't rely on the PageDirty any more.
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Reported-by: Yalin Wang <yalin.wang@sonymobile.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/madvise.c | 1 -
>  mm/memory.c  | 9 +++++++--
>  mm/rmap.c    | 2 +-
>  mm/vmscan.c  | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 22e8f0c..a045798 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  				continue;
>  			}
>  
> -			ClearPageDirty(page);
>  			unlock_page(page);
>  		}
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 0f96a4a..40428a5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  
>  	inc_mm_counter_fast(mm, MM_ANONPAGES);
>  	dec_mm_counter_fast(mm, MM_SWAPENTS);
> -	pte = mk_pte(page, vma->vm_page_prot);
> +
> +	/*
> +	 * Every page swapped-out was pte_dirty so we make pte dirty again.
> +	 * MADV_FREE relies on it.
> +	 */
> +	pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
>  	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> -		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> +		pte = maybe_mkwrite(pte, vma);
>  		flags &= ~FAULT_FLAG_WRITE;
>  		ret |= VM_FAULT_WRITE;
>  		exclusive = 1;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47b3ba8..34c1d66 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  
>  		if (flags & TTU_FREE) {
>  			VM_BUG_ON_PAGE(PageSwapCache(page), page);
> -			if (!dirty && !PageDirty(page)) {
> +			if (!dirty) {
>  				/* It's a freeable page by MADV_FREE */
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				goto discard;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413..3357ffa 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -805,8 +805,7 @@ static enum page_references page_check_references(struct page *page,
>  		return PAGEREF_KEEP;
>  	}
>  
> -	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
> -			!PageDirty(page))
> +	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
>  		*freeable = true;
>  
>  	/* Reclaim if clean, defer dirty pages to writeback */
> -- 
> 1.9.3
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Yalin.Wang@sonymobile.com, Hugh Dickins <hughd@google.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Thu, 9 Apr 2015 08:50:25 +0900	[thread overview]
Message-ID: <20150408235012.GA13690@blaptop> (raw)
In-Reply-To: <1426036838-18154-4-git-send-email-minchan@kernel.org>

Bump.

On Wed, Mar 11, 2015 at 10:20:38AM +0900, Minchan Kim wrote:
> Bascially, MADV_FREE relys on the pte dirty to decide whether
> it allows VM to discard the page. However, if there is swap-in,
> pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> PageDirty and PageSwapCache for those pages to not discard it
> because swapped-in page could live on swap cache or PageDirty
> when it is removed from swapcache.
> 
> The problem in here is that anonymous pages can have PageDirty if
> it is removed from swapcache so that VM cannot parse those pages
> as freeable even if we did madvise_free. Look at below example.
> 
> ptr = malloc();
> memset(ptr);
> ..
> heavy memory pressure -> swap-out all of pages
> ..
> out of memory pressure so there are lots of free pages
> ..
> var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
>                but SetPageDirty
> 
> madvise_free(ptr);
> ..
> ..
> heavy memory pressure -> VM cannot discard the page by PageDirty.
> 
> PageDirty for anonymous page aims for avoiding duplicating
> swapping out. In other words, if a page have swapped-in but
> live swapcache(ie, !PageDirty), we could save swapout if the page
> is selected as victim by VM in future because swap device have
> kept previous swapped-out contents of the page.
> 
> So, rather than relying on the PG_dirty for working madvise_free,
> pte_dirty is more straightforward. Inherently, swapped-out page was
> pte_dirty so this patch restores the dirtiness when swap-in fault
> happens so madvise_free doesn't rely on the PageDirty any more.
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Reported-by: Yalin Wang <yalin.wang@sonymobile.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/madvise.c | 1 -
>  mm/memory.c  | 9 +++++++--
>  mm/rmap.c    | 2 +-
>  mm/vmscan.c  | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 22e8f0c..a045798 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  				continue;
>  			}
>  
> -			ClearPageDirty(page);
>  			unlock_page(page);
>  		}
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 0f96a4a..40428a5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2521,9 +2521,14 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  
>  	inc_mm_counter_fast(mm, MM_ANONPAGES);
>  	dec_mm_counter_fast(mm, MM_SWAPENTS);
> -	pte = mk_pte(page, vma->vm_page_prot);
> +
> +	/*
> +	 * Every page swapped-out was pte_dirty so we make pte dirty again.
> +	 * MADV_FREE relies on it.
> +	 */
> +	pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
>  	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> -		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> +		pte = maybe_mkwrite(pte, vma);
>  		flags &= ~FAULT_FLAG_WRITE;
>  		ret |= VM_FAULT_WRITE;
>  		exclusive = 1;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47b3ba8..34c1d66 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  
>  		if (flags & TTU_FREE) {
>  			VM_BUG_ON_PAGE(PageSwapCache(page), page);
> -			if (!dirty && !PageDirty(page)) {
> +			if (!dirty) {
>  				/* It's a freeable page by MADV_FREE */
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				goto discard;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413..3357ffa 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -805,8 +805,7 @@ static enum page_references page_check_references(struct page *page,
>  		return PAGEREF_KEEP;
>  	}
>  
> -	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
> -			!PageDirty(page))
> +	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
>  		*freeable = true;
>  
>  	/* Reclaim if clean, defer dirty pages to writeback */
> -- 
> 1.9.3
> 

-- 
Kind regards,
Minchan Kim

  parent reply	other threads:[~2015-04-08 23:50 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-11  1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-11  1:20 ` Minchan Kim
2015-03-11  1:20 ` [PATCH 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-11  1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-11  2:14   ` Wang, Yalin
2015-03-11  2:14     ` Wang, Yalin
2015-03-11  4:30     ` Minchan Kim
2015-03-11  4:30       ` Minchan Kim
2015-04-01 20:38     ` Rik van Riel
2015-04-01 20:38       ` Rik van Riel
2015-03-11  9:05   ` [RFC ] mm: don't ignore file map pages for madvise_free( ) Wang, Yalin
2015-03-11  9:05     ` Wang, Yalin
2015-03-11  9:47   ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
2015-03-11  9:47     ` Wang, Yalin
2015-03-20 22:43   ` [PATCH 3/4] mm: move lazy free pages to inactive list Andrew Morton
2015-03-20 22:43     ` Andrew Morton
2015-03-30  5:35     ` Minchan Kim
2015-03-30  5:35       ` Minchan Kim
2015-03-30 21:20       ` Andrew Morton
2015-03-30 21:20         ` Andrew Morton
2015-03-31  4:45         ` Minchan Kim
2015-03-31  4:45           ` Minchan Kim
2015-03-31  5:28           ` Andrew Morton
2015-03-31  5:28             ` Andrew Morton
2015-03-31  5:57             ` Minchan Kim
2015-03-31  5:57               ` Minchan Kim
2015-03-11  1:20 ` [PATCH 4/4] mm: make every pte dirty on do_swap_page Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-30  5:22   ` Minchan Kim
2015-03-30  5:22     ` Minchan Kim
2015-03-30  8:51     ` Cyrill Gorcunov
2015-03-30  8:51       ` Cyrill Gorcunov
2015-03-30  8:59       ` Minchan Kim
2015-03-30  8:59         ` Minchan Kim
2015-03-30 21:14         ` Cyrill Gorcunov
2015-03-30 21:14           ` Cyrill Gorcunov
2015-03-31  4:38           ` Minchan Kim
2015-03-31  4:38             ` Minchan Kim
2015-04-08 23:50   ` Minchan Kim [this message]
2015-04-08 23:50     ` Minchan Kim
2015-04-09 20:59     ` Andrew Morton
2015-04-09 20:59       ` Andrew Morton
2015-04-10  0:08       ` Minchan Kim
2015-04-10  0:08         ` Minchan Kim
2015-04-10  0:14       ` Rik van Riel
2015-04-10  0:14         ` Rik van Riel
2015-04-11 21:40   ` Hugh Dickins
2015-04-11 21:40     ` Hugh Dickins
2015-04-12 14:48     ` Minchan Kim
2015-04-12 14:48       ` Minchan Kim
2015-04-15  6:49       ` Minchan Kim
2015-04-15  6:49         ` Minchan Kim
2015-03-19  0:46 ` [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-19  0:46   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150408235012.GA13690@blaptop \
    --to=minchan@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=gorcunov@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.