From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
David Rientjes <rientjes@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 04/14] mm,migration: Allow the migration of PageSwapCache pages
Date: Fri, 23 Apr 2010 20:23:12 +0100 [thread overview]
Message-ID: <20100423192311.GC14351@csn.ul.ie> (raw)
In-Reply-To: <20100423183135.GT32034@random.random>
On Fri, Apr 23, 2010 at 08:31:35PM +0200, Andrea Arcangeli wrote:
> Hi Mel,
>
> On Thu, Apr 22, 2010 at 04:44:43PM +0100, Mel Gorman wrote:
> > heh, I thought of a similar approach at the same time as you but missed
> > this mail until later. However, with this approach I suspect there is a
> > possibility that two walkers of the same anon_vma list could livelock if
> > two locks on the list are held at the same time. Am still thinking of
> > how it could be resolved without introducing new locking.
>
> Trying to understand this issue and I've some questions. This
> vma_adjust and lock inversion troubles with the anon-vma lock in
> rmap_walk are a new issue introduced by the recent anon-vma changes,
> right?
>
In a manner of speaking. There was no locking going on but prior to the
anon_vma changes, there would have been only one anon_vma lock and the
fix would be easier - just take the lock on anon_vma->lock while the
VMAs are being updated.
> About swapcache, try_to_unmap just nuke the mappings, establish the
> swap entry in the pte (not migration entry), and then there's no need
> to call remove_migration_ptes.
That would be an alternative for swapcache but it's not necessarily
where the problem is.
> So it just need to skip it for
> swapcache. page_mapped must return zero after try_to_unmap returns
> before we're allowed to migrate (plus the page count must be just 1
> and not 2 or more for gup users!).
>
> I don't get what's the problem about swapcache and the races connected
> to it, the moment I hear migration PTE in context of swapcache
> migration I'm confused because there's no migration PTE for swapcache.
>
That was a mistake on my part. The race appears to be between vma_adjust
changing the details of the VMA while rmap_walk is going on. It mistakenly
believes the vma no longer spans the address, gets -EFAULT from vma_address
and doesn't clean up the migration PTE. This is later encountered but the
page lock is no longer held and it bugs. An alternative would be to clean
up the migration PTE of unlocked pages on the assumption it was due to this
race but it's a bit sloppier.
> The new page will have no mappings either, it just needs to be part of
> the swapcache with the same page->index = swapentry, indexed in the
> radix tree with that page->index, and paga->mapping pointing to
> swapcache. Then new page faults will bring it in the pageatables. The
> lookup_swap_cache has to be serialized against some lock, it should be
> the radix tree lock? So the migration has to happen with that lock
> hold no?
Think migrate_page_move_mapping() is what you're looking for? It takes
the mapping tree lock.
>We can't just migrate swapcache without stopping swapcache
> radix tree lookups no? I didn't digest the full migrate.c yet and I
> don't see where it happens. Freeing the swapcache while simpler and
> safer, would be quite bad as it'd create I/O for potentially hot-ram.
>
> About the refcounting of anon-vma in migrate.c I think it'd be much
> simpler if zap_page_range and folks would just wait (like they do if
> they find a pmd_trans_huge && pmd_trans_splitting pmd), there would be
> no need of refcounting the anon-vma that way.
>
I'm not getting what you're suggesting here. The refcount is to make
sure the anon_vma doesn't go away after the page mapcount reaches zero.
What are we waiting for?
> I assume whatever is added to rmap_walk I also have to add to
> split_huge_page later when switching to mainline anon-vma code (for
> now I stick to 2.6.32 anon-vma code to avoid debugging anon-vma-chain,
> memory compaction, swapcache migration and transparent hugepage at the
> same time, which becomes a little beyond feasibility).
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-23 19:23 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-20 21:01 [PATCH 0/14] Memory Compaction v8 Mel Gorman
2010-04-20 21:01 ` [PATCH 01/14] mm,migration: Take a reference to the anon_vma before migrating Mel Gorman
2010-04-21 2:49 ` KAMEZAWA Hiroyuki
2010-04-20 21:01 ` [PATCH 02/14] mm,migration: Share the anon_vma ref counts between KSM and page migration Mel Gorman
2010-04-20 21:01 ` [PATCH 03/14] mm,migration: Do not try to migrate unmapped anonymous pages Mel Gorman
2010-04-20 21:01 ` [PATCH 04/14] mm,migration: Allow the migration of PageSwapCache pages Mel Gorman
2010-04-21 14:30 ` Christoph Lameter
2010-04-21 15:00 ` Mel Gorman
2010-04-21 15:05 ` Christoph Lameter
2010-04-21 15:14 ` Mel Gorman
2010-04-21 15:31 ` Christoph Lameter
2010-04-21 15:34 ` Mel Gorman
2010-04-21 15:46 ` Christoph Lameter
2010-04-22 9:28 ` Mel Gorman
2010-04-22 9:46 ` KAMEZAWA Hiroyuki
2010-04-22 10:13 ` Minchan Kim
2010-04-22 10:31 ` KAMEZAWA Hiroyuki
2010-04-22 10:51 ` KAMEZAWA Hiroyuki
2010-04-22 14:14 ` Mel Gorman
2010-04-22 14:18 ` Minchan Kim
2010-04-22 15:40 ` Mel Gorman
2010-04-22 16:13 ` Mel Gorman
2010-04-22 19:29 ` Mel Gorman
2010-04-22 19:40 ` Christoph Lameter
2010-04-22 23:52 ` KAMEZAWA Hiroyuki
2010-04-23 9:03 ` Mel Gorman
2010-04-22 14:23 ` Minchan Kim
2010-04-22 14:40 ` Minchan Kim
2010-04-22 15:44 ` Mel Gorman
2010-04-23 18:31 ` Andrea Arcangeli
2010-04-23 19:23 ` Mel Gorman [this message]
2010-04-23 19:39 ` Andrea Arcangeli
2010-04-23 21:35 ` Andrea Arcangeli
2010-04-24 10:52 ` Mel Gorman
2010-04-24 11:13 ` Andrea Arcangeli
2010-04-24 11:59 ` Mel Gorman
2010-04-24 14:30 ` Andrea Arcangeli
2010-04-26 21:54 ` Rik van Riel
2010-04-26 22:11 ` Mel Gorman
2010-04-26 22:26 ` Andrea Arcangeli
2010-04-25 14:41 ` Andrea Arcangeli
2010-04-27 9:40 ` Mel Gorman
2010-04-27 10:41 ` KAMEZAWA Hiroyuki
2010-04-27 11:12 ` Mel Gorman
2010-04-27 15:42 ` Andrea Arcangeli
2010-04-24 10:50 ` Mel Gorman
2010-04-22 15:14 ` Christoph Lameter
2010-04-23 3:39 ` Paul E. McKenney
2010-04-23 4:55 ` Minchan Kim
2010-04-21 23:59 ` KAMEZAWA Hiroyuki
2010-04-22 0:11 ` Minchan Kim
2010-04-20 21:01 ` [PATCH 05/14] mm: Allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove Mel Gorman
2010-04-20 21:01 ` [PATCH 06/14] mm: Export unusable free space index via debugfs Mel Gorman
2010-04-20 21:01 ` [PATCH 07/14] mm: Export fragmentation " Mel Gorman
2010-04-20 21:01 ` [PATCH 08/14] mm: Move definition for LRU isolation modes to a header Mel Gorman
2010-04-20 21:01 ` [PATCH 09/14] mm,compaction: Memory compaction core Mel Gorman
2010-04-20 21:01 ` [PATCH 10/14] mm,compaction: Add /proc trigger for memory compaction Mel Gorman
2010-04-20 21:01 ` [PATCH 11/14] mm,compaction: Add /sys trigger for per-node " Mel Gorman
2010-04-20 21:01 ` [PATCH 12/14] mm,compaction: Direct compact when a high-order allocation fails Mel Gorman
2010-05-05 12:19 ` [PATCH] fix count_vm_event preempt in memory compaction direct reclaim Andrea Arcangeli
2010-05-05 12:51 ` Mel Gorman
2010-05-05 13:11 ` Andrea Arcangeli
2010-05-05 13:55 ` Mel Gorman
2010-05-05 14:48 ` Andrea Arcangeli
2010-05-05 15:14 ` Mel Gorman
2010-05-05 15:25 ` Andrea Arcangeli
2010-05-05 15:32 ` Mel Gorman
2010-04-20 21:01 ` [PATCH 13/14] mm,compaction: Add a tunable that decides when memory should be compacted and when it should be reclaimed Mel Gorman
2010-04-20 21:01 ` [PATCH 14/14] mm,compaction: Defer compaction using an exponential backoff when compaction fails Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100423192311.GC14351@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=cl@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).