All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	linux-mm <linux-mm@kvack.org>, Josef Bacik <josef@toxicpanda.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: fix long time stall from mm_populate
Date: Wed, 12 Feb 2020 18:00:30 -0800	[thread overview]
Message-ID: <20200212180030.a89da9c4cf2b9d11efcc25db@linux-foundation.org> (raw)
In-Reply-To: <20200212231210.GA233109@google.com>

On Wed, 12 Feb 2020 15:12:10 -0800 Minchan Kim <minchan@kernel.org> wrote:

> On Wed, Feb 12, 2020 at 02:24:35PM -0800, Andrew Morton wrote:
> > On Wed, 12 Feb 2020 11:53:22 -0800 Minchan Kim <minchan@kernel.org> wrote:
> > 
> > > > That's definitely wrong.  It'll clear PageReclaim and then pretend it did
> > > > nothing wrong.
> > > > 
> > > > 	return !PageWriteback(page) ||
> > > > 		test_and_clear_bit(PG_reclaim, &page->flags);
> > > > 
> > > 
> > > Much better, Thanks for the review, Matthew!
> > > If there is no objection, I will send two patches to Andrew.
> > > One is PageReadahead strict, the other is limit retry from mm_populate.
> > 
> > With much more detailed changelogs, please!
> > 
> > This all seems rather screwy.  if a page is under writeback then it is
> > uptodate and we should be able to fault it in immediately.
> 
> Hi Andrew,
> 
> This description in cover-letter will work? If so, I will add each part
> below in each patch.
> 
> Subject: [PATCH 0/3] fixing mm_populate long stall
> 
> I got several reports major page fault takes several seconds sometime.
> When I review drop mmap_sem in page fault hanlder, I found several bugs.
> 
>    CPU 1							CPU 2
> mm_populate
>  for ()
>    ..
>    ret = populate_vma_page_range
>      __get_user_pages
>        faultin_page
>          handle_mm_fault
> 	   filemap_fault
> 	     do_async_mmap_readahead
> 	     						shrink_page_list
> 							  pageout
> 							    SetPageReclaim(=SetPageReadahead)
> 							      writepage
> 							        SetPageWriteback
> 	       if (PageReadahead(page))
> 	         maybe_unlock_mmap_for_io
> 		   up_read(mmap_sem)
> 		 page_cache_async_readahead()
> 		   if (PageWriteback(page))
> 		     return;
> 
>     here, since ret from populate_vma_page_range is zero,
>     the loop continue to run with same address with previous
>     iteration. It will repeat the loop until the page's
>     writeout is done(ie, PG_writeback or PG_reclaim clear).

The populate_vma_page_range() kerneldoc is wrong.  "return 0 on
success, negative error code on error".  Care to fix that please?

> We could fix the above specific case via adding PageWriteback. IOW,
> 
>    ret = populate_vma_page_range
>    	   ...
> 	   ...
> 	   filemap_fault
> 	     do_async_mmap_readahead
> 	       if (!PageWriteback(page) && PageReadahead(page))
> 	         maybe_unlock_mmap_for_io
> 		   up_read(mmap_sem)
> 		 page_cache_async_readahead()
> 		   if (PageWriteback(page))
> 		     return;

Well yes, but the testing of PageWriteback() is a hack added in
fe3cba17c49471 to permit the sharing of PG_reclaim and PG_readahead. 
If we didn't need that hack then we could avoid adding new hacks to
hack around the old hack :(.  Have you considered anything along those
lines?  Rework how we handle PG_reclaim/PG_readahead?

> That's a thing [3/3] is fixing here. Even though it could fix the
> problem effectively, it has still livelock problem theoretically
> because the page of faulty address could be reclaimed and then
> allocated/become readahead marker on other CPUs during faulty
> process is retrying in mm_populate's loop.

Really?  filemap_fault()'s

	if (!lock_page_maybe_drop_mmap(vmf, page, &fpin))
		goto out_retry;

	/* Did it get truncated? */
	if (unlikely(compound_head(page)->mapping != mapping)) {
		unlock_page(page);
		put_page(page);
		goto retry_find;
	}

should handle such cases?

> [2/3] is fixing the
> such livelock via limiting retry count.

I wouldn't call that "fixing" :(

> There is another hole for the livelock or hang of the process as well
> as ageWriteback - ra_pages.
> 
> mm_populate
>  for ()
>    ..
>    ret = populate_vma_page_range
>      __get_user_pages
>        faultin_page
>          handle_mm_fault
> 	   filemap_fault
> 	     do_async_mmap_readahead
> 	       if (PageReadahead(page))
> 	         maybe_unlock_mmap_for_io
> 		   up_read(mmap_sem)
> 		 page_cache_async_readahead()
> 		   if (!ra->ra_pages)
> 		     return;
> 
> It will repeat the loop until ra->ra_pages become non-zero.
> [1/3] is fixing the problem.
> 



  reply	other threads:[~2020-02-13  2:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11  0:19 [PATCH] mm: fix long time stall from mm_populate Minchan Kim
2020-02-11  1:10 ` Matthew Wilcox
2020-02-11  3:50   ` Minchan Kim
2020-02-11  3:54     ` Matthew Wilcox
2020-02-11  4:25       ` Minchan Kim
2020-02-11 12:23         ` Matthew Wilcox
2020-02-11 16:34           ` Minchan Kim
2020-02-11 17:28             ` Matthew Wilcox
2020-02-11 17:57               ` Minchan Kim
2020-02-12 10:18                 ` Jan Kara
2020-02-12 17:40                   ` Minchan Kim
2020-02-12 18:28                     ` Matthew Wilcox
2020-02-12 19:53                       ` Minchan Kim
2020-02-12 22:24                         ` Andrew Morton
2020-02-12 23:12                           ` Minchan Kim
2020-02-13  2:00                             ` Andrew Morton [this message]
2020-02-13 17:24                               ` Minchan Kim
2020-02-11 18:14               ` Yang Shi
2020-02-12 10:22 ` Jan Kara
2020-02-12 17:43   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200212180030.a89da9c4cf2b9d11efcc25db@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.