From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <willy@infradead.org>,
Josef Bacik <josef@toxicpanda.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan@kernel.org>
Subject: [PATCH v2 2/2] mm: fix long time stall from mm_populate
Date: Fri, 14 Feb 2020 11:29:51 -0800 [thread overview]
Message-ID: <20200214192951.29430-2-minchan@kernel.org> (raw)
In-Reply-To: <20200214192951.29430-1-minchan@kernel.org>
Basically, fault handler releases mmap_sem before requesting readahead
and then it is supposed to retry lookup the page from page cache with
FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry.
However, what happens if the fault handler find a page from page
cache and the page has readahead marker but are waiting under
writeback? Plus one more condition, it happens under mm_populate
which repeats faulting unless it encounters error. So let's assemble
conditions below.
CPU 1 CPU 2
- first loop
mm_populate
for ()
..
ret = populate_vma_page_range
__get_user_pages
faultin_page
handle_mm_fault
filemap_fault
do_async_mmap_readahead
if (PageReadahead(pageA))
maybe_unlock_mmap_for_io
up_read(mmap_sem)
shrink_page_list
pageout
SetPageReclaim(=SetPageReadahead)(pageA)
writepage
SetPageWriteback(pageA)
page_cache_async_readahead()
ClearPageReadahead(pageA)
do_async_mmap_readahead
lock_page_maybe_drop_mmap
goto out_retry
the pageA is reclaimed
and new pageB is populated to the file offset
and finally has become PG_readahead
- second loop
__get_user_pages
faultin_page
handle_mm_fault
filemap_fault
do_async_mmap_readahead
if (PageReadahead(pageB))
maybe_unlock_mmap_for_io
up_read(mmap_sem)
shrink_page_list
pageout
SetPageReclaim(=SetPageReadahead)(pageB)
writepage
SetPageWriteback(pageB)
page_cache_async_readahead()
ClearPageReadahead(pageB)
do_async_mmap_readahead
lock_page_maybe_drop_mmap
goto out_retry
It could be repeated forever so it's livelock. without involving reclaim,
it could happens if ra_pages become zero by fadvise/other threads who
have same fd one doing randome while the other one is sequential
because page_cache_async_readahead has following condition check like
PageWriteback and ra_pages are never synchrnized with fadvise and
shrink_readahead_size_eio from other threads.
void page_cache_async_readahead(struct address_space *mapping,
unsigned long req_size)
{
/* no read-ahead */
if (!ra->ra_pages)
return;
Thus, we need to limit fault retry from mm_populate like page
fault handler.
Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/gup.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0ac1de..6f6548c63ad5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
*
* This takes care of mlocking the pages too if VM_LOCKED is set.
*
- * return 0 on success, negative error code on error.
+ * return number of pages pinned on success, negative error code on error.
*
* vma->vm_mm->mmap_sem must be held.
*
@@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
struct vm_area_struct *vma = NULL;
int locked = 0;
long ret = 0;
+ bool tried = false;
end = start + len;
@@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
* double checks the vma flags, so that it won't mlock pages
* if the vma was already munlocked.
*/
- ret = populate_vma_page_range(vma, nstart, nend, &locked);
+ ret = populate_vma_page_range(vma, nstart, nend,
+ tried ? NULL : &locked);
if (ret < 0) {
if (ignore_errors) {
ret = 0;
continue; /* continue at next VMA */
}
break;
- }
+ } else if (ret == 0)
+ tried = true;
+ else
+ tried = false;
nend = nstart + ret * PAGE_SIZE;
ret = 0;
}
--
2.25.0.265.gbab2e86ba0-goog
next prev parent reply other threads:[~2020-02-14 19:30 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-14 19:29 [PATCH v2 1/2] mm: make PageReadahead more strict Minchan Kim
2020-02-14 19:29 ` Minchan Kim [this message]
2020-02-21 17:50 ` [PATCH v2 2/2] mm: fix long time stall from mm_populate Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200214192951.29430-2-minchan@kernel.org \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.