All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Yalin Wang <Yalin.Wang@sonymobile.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Wed, 15 Apr 2015 15:49:45 +0900	[thread overview]
Message-ID: <20150415064945.GA22700@blaptop> (raw)
In-Reply-To: <20150412144823.GA414@blaptop>

On Sun, Apr 12, 2015 at 11:48:23PM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
> > On Wed, 11 Mar 2015, Minchan Kim wrote:
> > 
> > > Bascially, MADV_FREE relys on the pte dirty to decide whether
> > > it allows VM to discard the page. However, if there is swap-in,
> > > pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> > > PageDirty and PageSwapCache for those pages to not discard it
> > > because swapped-in page could live on swap cache or PageDirty
> > > when it is removed from swapcache.
> > > 
> > > The problem in here is that anonymous pages can have PageDirty if
> > > it is removed from swapcache so that VM cannot parse those pages
> > > as freeable even if we did madvise_free. Look at below example.
> > > 
> > > ptr = malloc();
> > > memset(ptr);
> > > ..
> > > heavy memory pressure -> swap-out all of pages
> > > ..
> > > out of memory pressure so there are lots of free pages
> > > ..
> > > var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
> > >                but SetPageDirty
> > > 
> > > madvise_free(ptr);
> > > ..
> > > ..
> > > heavy memory pressure -> VM cannot discard the page by PageDirty.
> > > 
> > > PageDirty for anonymous page aims for avoiding duplicating
> > > swapping out. In other words, if a page have swapped-in but
> > > live swapcache(ie, !PageDirty), we could save swapout if the page
> > > is selected as victim by VM in future because swap device have
> > > kept previous swapped-out contents of the page.
> > > 
> > > So, rather than relying on the PG_dirty for working madvise_free,
> > > pte_dirty is more straightforward. Inherently, swapped-out page was
> > > pte_dirty so this patch restores the dirtiness when swap-in fault
> > > happens so madvise_free doesn't rely on the PageDirty any more.
> > > 
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> > > Cc: Pavel Emelyanov <xemul@parallels.com>
> > > Reported-by: Yalin Wang <yalin.wang@sonymobile.com>
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Sorry, but NAK to this patch,
> > mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
> > (I hope it hasn't reached linux-next yet).
> > 
> > You may well be right that pte_dirty<->PageDirty can be handled
> > differently, in a way more favourable to MADV_FREE.  And this patch
> > may be a step in the right direction, but I've barely given it thought.
> > 
> > As it stands, it segfaults more than any patch I've seen in years:
> > I just tried applying it to 4.0-rc7-mm1, and running kernel builds
> > in low memory with swap.  Even if I leave KSM out, and memcg out, and
> > swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
> > 
> > I have a choice: spend a few hours tracking down the errors, and
> > post a fix patch on top of yours?  But even then I'd want to spend
> > a lot longer thinking through every dirty/Dirty in the source before
> > I'd feel comfortable to give an ack.
> > 
> > This is users' data, and we need to be very careful with it: errors
> > in MADV_FREE are one thing, for now that's easy to avoid; but in this
> > patch you're changing the rules for Anon PageDirty for everyone.
> > 
> > I think for now I'll have to leave it to you to do much more source
> > diligence and testing, before coming back with a corrected patch for
> > us then to review, slowly and carefully.
> 
> Sorry for my bad. I will keep your advise in mind.
> I will investigate the problem as soon as I get back to work
> after vacation.
> 
> Thanks for the the review.

When I look at the code, migration doesn't restore dirty bit of pte
in remove_migration_pte and relys on PG_dirty which was set by
try_to_unmap_one. I think it was a reason you saw segfault.
I will spend more time to investigate another code piece which might
ignore dirty bit restore.

Thanks.



> 
> -- 
> Kind regards,
> Minchan Kim

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Yalin Wang <Yalin.Wang@sonymobile.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page
Date: Wed, 15 Apr 2015 15:49:45 +0900	[thread overview]
Message-ID: <20150415064945.GA22700@blaptop> (raw)
In-Reply-To: <20150412144823.GA414@blaptop>

On Sun, Apr 12, 2015 at 11:48:23PM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote:
> > On Wed, 11 Mar 2015, Minchan Kim wrote:
> > 
> > > Bascially, MADV_FREE relys on the pte dirty to decide whether
> > > it allows VM to discard the page. However, if there is swap-in,
> > > pte pointed out the page has no pte_dirty. So, MADV_FREE checks
> > > PageDirty and PageSwapCache for those pages to not discard it
> > > because swapped-in page could live on swap cache or PageDirty
> > > when it is removed from swapcache.
> > > 
> > > The problem in here is that anonymous pages can have PageDirty if
> > > it is removed from swapcache so that VM cannot parse those pages
> > > as freeable even if we did madvise_free. Look at below example.
> > > 
> > > ptr = malloc();
> > > memset(ptr);
> > > ..
> > > heavy memory pressure -> swap-out all of pages
> > > ..
> > > out of memory pressure so there are lots of free pages
> > > ..
> > > var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
> > >                but SetPageDirty
> > > 
> > > madvise_free(ptr);
> > > ..
> > > ..
> > > heavy memory pressure -> VM cannot discard the page by PageDirty.
> > > 
> > > PageDirty for anonymous page aims for avoiding duplicating
> > > swapping out. In other words, if a page have swapped-in but
> > > live swapcache(ie, !PageDirty), we could save swapout if the page
> > > is selected as victim by VM in future because swap device have
> > > kept previous swapped-out contents of the page.
> > > 
> > > So, rather than relying on the PG_dirty for working madvise_free,
> > > pte_dirty is more straightforward. Inherently, swapped-out page was
> > > pte_dirty so this patch restores the dirtiness when swap-in fault
> > > happens so madvise_free doesn't rely on the PageDirty any more.
> > > 
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> > > Cc: Pavel Emelyanov <xemul@parallels.com>
> > > Reported-by: Yalin Wang <yalin.wang@sonymobile.com>
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Sorry, but NAK to this patch,
> > mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree
> > (I hope it hasn't reached linux-next yet).
> > 
> > You may well be right that pte_dirty<->PageDirty can be handled
> > differently, in a way more favourable to MADV_FREE.  And this patch
> > may be a step in the right direction, but I've barely given it thought.
> > 
> > As it stands, it segfaults more than any patch I've seen in years:
> > I just tried applying it to 4.0-rc7-mm1, and running kernel builds
> > in low memory with swap.  Even if I leave KSM out, and memcg out, and
> > swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon.
> > 
> > I have a choice: spend a few hours tracking down the errors, and
> > post a fix patch on top of yours?  But even then I'd want to spend
> > a lot longer thinking through every dirty/Dirty in the source before
> > I'd feel comfortable to give an ack.
> > 
> > This is users' data, and we need to be very careful with it: errors
> > in MADV_FREE are one thing, for now that's easy to avoid; but in this
> > patch you're changing the rules for Anon PageDirty for everyone.
> > 
> > I think for now I'll have to leave it to you to do much more source
> > diligence and testing, before coming back with a corrected patch for
> > us then to review, slowly and carefully.
> 
> Sorry for my bad. I will keep your advise in mind.
> I will investigate the problem as soon as I get back to work
> after vacation.
> 
> Thanks for the the review.

When I look at the code, migration doesn't restore dirty bit of pte
in remove_migration_pte and relys on PG_dirty which was set by
try_to_unmap_one. I think it was a reason you saw segfault.
I will spend more time to investigate another code piece which might
ignore dirty bit restore.

Thanks.



> 
> -- 
> Kind regards,
> Minchan Kim

-- 
Kind regards,
Minchan Kim

  reply	other threads:[~2015-04-15  6:49 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-11  1:20 [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-11  1:20 ` Minchan Kim
2015-03-11  1:20 ` [PATCH 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-11  1:20 ` [PATCH 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-11  2:14   ` Wang, Yalin
2015-03-11  2:14     ` Wang, Yalin
2015-03-11  4:30     ` Minchan Kim
2015-03-11  4:30       ` Minchan Kim
2015-04-01 20:38     ` Rik van Riel
2015-04-01 20:38       ` Rik van Riel
2015-03-11  9:05   ` [RFC ] mm: don't ignore file map pages for madvise_free( ) Wang, Yalin
2015-03-11  9:05     ` Wang, Yalin
2015-03-11  9:47   ` [RFC] mm:do recheck for freeable page in reclaim path Wang, Yalin
2015-03-11  9:47     ` Wang, Yalin
2015-03-20 22:43   ` [PATCH 3/4] mm: move lazy free pages to inactive list Andrew Morton
2015-03-20 22:43     ` Andrew Morton
2015-03-30  5:35     ` Minchan Kim
2015-03-30  5:35       ` Minchan Kim
2015-03-30 21:20       ` Andrew Morton
2015-03-30 21:20         ` Andrew Morton
2015-03-31  4:45         ` Minchan Kim
2015-03-31  4:45           ` Minchan Kim
2015-03-31  5:28           ` Andrew Morton
2015-03-31  5:28             ` Andrew Morton
2015-03-31  5:57             ` Minchan Kim
2015-03-31  5:57               ` Minchan Kim
2015-03-11  1:20 ` [PATCH 4/4] mm: make every pte dirty on do_swap_page Minchan Kim
2015-03-11  1:20   ` Minchan Kim
2015-03-30  5:22   ` Minchan Kim
2015-03-30  5:22     ` Minchan Kim
2015-03-30  8:51     ` Cyrill Gorcunov
2015-03-30  8:51       ` Cyrill Gorcunov
2015-03-30  8:59       ` Minchan Kim
2015-03-30  8:59         ` Minchan Kim
2015-03-30 21:14         ` Cyrill Gorcunov
2015-03-30 21:14           ` Cyrill Gorcunov
2015-03-31  4:38           ` Minchan Kim
2015-03-31  4:38             ` Minchan Kim
2015-04-08 23:50   ` Minchan Kim
2015-04-08 23:50     ` Minchan Kim
2015-04-09 20:59     ` Andrew Morton
2015-04-09 20:59       ` Andrew Morton
2015-04-10  0:08       ` Minchan Kim
2015-04-10  0:08         ` Minchan Kim
2015-04-10  0:14       ` Rik van Riel
2015-04-10  0:14         ` Rik van Riel
2015-04-11 21:40   ` Hugh Dickins
2015-04-11 21:40     ` Hugh Dickins
2015-04-12 14:48     ` Minchan Kim
2015-04-12 14:48       ` Minchan Kim
2015-04-15  6:49       ` Minchan Kim [this message]
2015-04-15  6:49         ` Minchan Kim
2015-03-19  0:46 ` [PATCH 1/4] mm: free swp_entry in madvise_free Minchan Kim
2015-03-19  0:46   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150415064945.GA22700@blaptop \
    --to=minchan@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=gorcunov@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mm-commits@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.