From: Minchan Kim <minchan@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>, Josef Bacik <josef@toxicpanda.com>,
Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: fix long time stall from mm_populate
Date: Tue, 11 Feb 2020 08:34:04 -0800 [thread overview]
Message-ID: <20200211163404.GC242563@google.com> (raw)
In-Reply-To: <20200211122323.GS8731@bombadil.infradead.org>
On Tue, Feb 11, 2020 at 04:23:23AM -0800, Matthew Wilcox wrote:
> On Mon, Feb 10, 2020 at 08:25:36PM -0800, Minchan Kim wrote:
> > On Mon, Feb 10, 2020 at 07:54:12PM -0800, Matthew Wilcox wrote:
> > > On Mon, Feb 10, 2020 at 07:50:04PM -0800, Minchan Kim wrote:
> > > > On Mon, Feb 10, 2020 at 05:10:21PM -0800, Matthew Wilcox wrote:
> > > > > On Mon, Feb 10, 2020 at 04:19:58PM -0800, Minchan Kim wrote:
> > > > > > filemap_fault
> > > > > > find a page form page(PG_uptodate|PG_readahead|PG_writeback)
> > > > >
> > > > > Uh ... That shouldn't be possible.
> > > >
> > > > Please see shrink_page_list. Vmscan uses PG_reclaim to accelerate
> > > > page reclaim when the writeback is done so the page will have both
> > > > flags at the same time and the PG reclaim could be regarded as
> > > > PG_readahead in fault conext.
> > >
> > > What part of fault context can make that mistake? The snippet I quoted
> > > below is from page_cache_async_readahead() where it will clearly not
> > > make that mistake. There's a lot of code here; please don't presume I
> > > know all the areas you're talking about.
> >
> > Sorry about being not clear. I am saying filemap_fault ->
> > do_async_mmap_readahead
> >
> > Let's assume the page is hit in page cache and vmf->flags is !FAULT_FLAG
> > TRIED so it calls do_async_mmap_readahead. Since the page has PG_reclaim
> > and PG_writeback by shrink_page_list, it goes to
> >
> > do_async_mmap_readahead
> > if (PageReadahead(page))
> > fpin = maybe_unlock_mmap_for_io();
> > page_cache_async_readahead
> > if (PageWriteback(page))
> > return;
> > ClearPageReadahead(page); <- doesn't reach here until the writeback is clear
> >
> > So, mm_populate will repeat the loop until the writeback is done.
> > It's my just theory but didn't comfirm it by the testing.
> > If I miss something clear, let me know it.
>
> Ah! Surely the right way to fix this is ...
I'm not sure it's right fix. Actually, I wanted to remove PageWriteback check
in page_cache_async_readahead because I don't see corelation. Why couldn't we
do readahead if the marker page is PG_readahead|PG_writeback design PoV?
Only reason I can think of is it makes *a page* will be delayed for freeing
since we removed PG_reclaim bit, which would be over-optimization for me.
Other concern is isn't it's racy? IOW, page was !PG_writeback at the check below
in your snippet but it was under PG_writeback in page_cache_async_readahead and
then the IO was done before refault reaching the code again. It could be repeated
*theoretically* even though it's very hard to happen in real practice.
Thus, I think it would be better to remove PageWriteback check from
page_cache_async_readahead if we really want to go the approach.
However, page_cache_async_readahead has another condition to bail out: ra_pages
I think it's also racy with fadvise or shrinking the window size from other tasks.
That's why I thought second trial with non-fault retry logic from caller would fix
all potnetial issues all at once like page fault handler have done.
>
> +++ b/mm/filemap.c
> @@ -2420,7 +2420,7 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> return fpin;
> if (ra->mmap_miss > 0)
> ra->mmap_miss--;
> - if (PageReadahead(page)) {
> + if (!PageWriteback(page) && PageReadahead(page)) {
> fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> page_cache_async_readahead(mapping, ra, file,
> page, offset, ra->ra_pages);
>
next prev parent reply other threads:[~2020-02-11 16:34 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-11 0:19 [PATCH] mm: fix long time stall from mm_populate Minchan Kim
2020-02-11 1:10 ` Matthew Wilcox
2020-02-11 3:50 ` Minchan Kim
2020-02-11 3:54 ` Matthew Wilcox
2020-02-11 4:25 ` Minchan Kim
2020-02-11 12:23 ` Matthew Wilcox
2020-02-11 16:34 ` Minchan Kim [this message]
2020-02-11 17:28 ` Matthew Wilcox
2020-02-11 17:57 ` Minchan Kim
2020-02-12 10:18 ` Jan Kara
2020-02-12 17:40 ` Minchan Kim
2020-02-12 18:28 ` Matthew Wilcox
2020-02-12 19:53 ` Minchan Kim
2020-02-12 22:24 ` Andrew Morton
2020-02-12 23:12 ` Minchan Kim
2020-02-13 2:00 ` Andrew Morton
2020-02-13 17:24 ` Minchan Kim
2020-02-11 18:14 ` Yang Shi
2020-02-12 10:22 ` Jan Kara
2020-02-12 17:43 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200211163404.GC242563@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.