From: Mel Gorman <mel@csn.ul.ie>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: mmotm hangs on compaction lock_page
Date: Fri, 7 Jan 2011 17:57:05 +0000 [thread overview]
Message-ID: <20110107175705.GL29257@csn.ul.ie> (raw)
In-Reply-To: <20110107145259.GK29257@csn.ul.ie>
On Fri, Jan 07, 2011 at 02:52:59PM +0000, Mel Gorman wrote:
> On Thu, Jan 06, 2011 at 05:20:25PM -0800, Hugh Dickins wrote:
> > Hi Mel,
> >
> > Here are the traces of two concurrent "cp -a kerneltree elsewhere"s
> > which have hung on mmotm: in limited RAM, on a PowerPC
>
> How limited in RAM and how many CPUs?
>
> > - I don't have
> > an explanation for why I can reproduce it in minutes on that box but
> > never yet on the x86s.
> >
>
> Strongest bet is simply that compaction is not triggering for you on the
> x86 boxes. Monitor "grep compact /proc/vmstat" on the two machines and
> see if the counters are growing on powerpc and not on x86. I'm trying to
> reproduce the problem locally but no luck yet.
>
> > Perhaps we can get it to happen with just one cp: the second cp here
> > seemed to be deadlocking itself, unmap_and_move()'s force lock_page
> > waiting on a page which its page cache readahead already holds locked.
> >
> > cp D 000000000fea3110 0 18874 18873 0x00008010
> > Call Trace:
> > .__switch_to+0xcc/0x110
> > .schedule+0x670/0x7b0
> > .io_schedule+0x50/0x8c
> > .sync_page+0x84/0xa0
> > .sync_page_killable+0x10/0x48
> > .__wait_on_bit_lock+0x9c/0x140
> > .__lock_page_killable+0x74/0x98
> > .do_generic_file_read+0x2b0/0x504
> > .generic_file_aio_read+0x214/0x29c
> > .do_sync_read+0xac/0x10c
> > .vfs_read+0xd0/0x1a0
> > .SyS_read+0x58/0xa0
> > syscall_exit+0x0/0x40
> > syscall_exit+0x0/0x40
> >
> > cp D 000000000fea3110 0 18876 18875 0x00008010
> > Call Trace:
> > 0xc000000001343b68 (unreliable)
> > .__switch_to+0xcc/0x110
> > .schedule+0x670/0x7b0
> > .io_schedule+0x50/0x8c
> > .sync_page+0x84/0xa0
> > .__wait_on_bit_lock+0x9c/0x140
> > .__lock_page+0x74/0x98
> > .unmap_and_move+0xfc/0x380
> > .migrate_pages+0xbc/0x18c
> > .compact_zone+0xbc/0x400
> > .compact_zone_order+0xc8/0xf4
> > .try_to_compact_pages+0x104/0x1b8
> > .__alloc_pages_direct_compact+0xa8/0x228
> > .__alloc_pages_nodemask+0x42c/0x730
> > .allocate_slab+0x84/0x168
> > .new_slab+0x58/0x198
> > .__slab_alloc+0x1ec/0x430
> > .kmem_cache_alloc+0x7c/0xe0
> > .radix_tree_preload+0x94/0x140
> > .add_to_page_cache_locked+0x70/0x1f0
> > .add_to_page_cache_lru+0x50/0xac
> > .mpage_readpages+0xcc/0x198
> > .ext3_readpages+0x28/0x40
> > .__do_page_cache_readahead+0x1ac/0x2ac
> > .ra_submit+0x28/0x38
>
> Something is odd right here. I would have expected entries in the
> calli stack containing
>
> .ondemand_readahead
> .page_cache_sync_readahead
>
> I am going to have to assume these functions were really called
> otherwise the ra_submit is a mystery :(
>
> > .do_generic_file_read+0xe8/0x504
> > .generic_file_aio_read+0x214/0x29c
> > .do_sync_read+0xac/0x10c
> > .vfs_read+0xd0/0x1a0
> > .SyS_read+0x58/0xa0
> > syscall_exit+0x0/0x40
> >
> > I haven't made a patch for it, just hacked unmap_and_move() to say
> > "if (!0)" instead of "if (!force)" to get on with my testing. I expect
> > you'll want to pass another arg down to migrate_pages() to prevent
> > setting force, or give it some flags, or do something with PF_MEMALLOC.
> >
>
> I tend to agree but I'm failing to see how it might be happening right now.
> The callchain looks something like
>
> do_generic_file_read
> # page is not found
> page_cache_sync_readahead
> ondemand_readahead
> ra_submit
> __do_page_cache_readahead
> # Allocates a bunch of pages
> # Sets PageReadahead. Otherwise the pages initialised
> # and they are not on the LRU yet
> read_pages
> # Calls mapping->readpages which calls mpage_readpages
> mpage_readpages
> # For the list of pages (index initialised), add
> # each of them to the LRU. Adding to the LRU
> # locks the page and should return the page
> # locked.
> add_to_page_cache_lru
> # sets PageSwapBacked
> add_to_page_cache
> # locks page
> add_to_page_cache_locked
> # preloads radix tree
> radix_tree_preload
> # DEADLOCK HERE. This is what does not
> # make sense. Compaction could not be
> # be finding the page on the LRU as
> # lru_cache_add_file() has not been
> # called yet for this page
>
> So I don't think we are locking on the same page.
>
> Here is a possibility. mpage_readpages() is reading ahead so there are obviously
> pages that are not Uptodate. It queues these for asynchronous read with
> block_read_full_page(), returns and adds the page to the LRU (so compaction
> is now able to find it). IO starts at some time in the future with the page
> still locked and gets unlocked at the end of IO by end_buffer_async_read().
>
> Between when IO is queued and it completes, a new page is being added to
> the LRU, the radix tree is loaded and compaction kicks off trying to
> lock the same page that is not up to date yet. Something is preventing
> the IO completing and the page being unlocked but I'm missing what that
> might be.
>
> Does this sound plausible? I'll keep looking but I wanted to see if
> someone spotted quickly a major flaw in the reasoning or have a quick
> guess as to why the page might not be getting unlocked at the end of IO
> properly.
>
I've still not figured out why the pages are not getting unlocked again
but can you try the following bodge patch to confirm it's something to
do with !Uptodate pages please?
==== CUT HERE ====
diff --git a/mm/migrate.c b/mm/migrate.c
index e12f717..66a2aac 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -612,6 +612,23 @@ static int move_to_new_page(struct page *newpage, struct page *page,
}
/*
+ * HACK
+ * Do not allow direct compaction to migrate !Uptodate pages. It is possible
+ * the caller has locked the page for IO to complete and for some reason the
+ * IO is not completing and unlocking the page leading to deadlock
+ */
+static inline bool safe_to_lock_page(struct page *page)
+{
+ if (!(current->flags & PF_MEMALLOC))
+ return true;
+
+ if (!PageUptodate(page))
+ return false;
+
+ return true;
+}
+
+/*
* Obtain the lock on page, remove all ptes and migrate the page
* to the newly allocated page in newpage.
*/
@@ -642,7 +659,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
rc = -EAGAIN;
if (!trylock_page(page)) {
- if (!force)
+ if (!force || !safe_to_lock_page(page))
goto move_newpage;
lock_page(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-01-07 17:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-07 1:20 mmotm hangs on compaction lock_page Hugh Dickins
2011-01-07 14:52 ` Mel Gorman
2011-01-07 17:57 ` Mel Gorman [this message]
2011-01-10 17:26 ` Mel Gorman
2011-01-10 23:56 ` Hugh Dickins
2011-01-11 2:14 ` KAMEZAWA Hiroyuki
2011-01-11 5:12 ` Minchan Kim
2011-01-11 2:34 ` Minchan Kim
2011-01-11 11:45 ` Mel Gorman
2011-01-11 14:09 ` Minchan Kim
2011-01-11 20:41 ` Hugh Dickins
2011-01-12 9:22 ` Mel Gorman
2011-01-11 20:45 ` Andrew Morton
2011-01-11 21:03 ` Hugh Dickins
2011-01-11 21:13 ` Andrew Morton
2011-01-12 9:25 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110107175705.GL29257@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).