Re: [PATCH v2] mm: fix long time stall from mm_populate

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2] mm: fix long time stall from mm_populate
Date: Fri, 13 Mar 2020 14:14:16 -0700	[thread overview]
Message-ID: <20200313211416.GC78185@google.com> (raw)
In-Reply-To: <20200303002638.206421-1-minchan@kernel.org>

Hi Andrew,

Any chance to take a look?

On Mon, Mar 02, 2020 at 04:26:38PM -0800, Minchan Kim wrote:
> Basically, fault handler releases mmap_sem before requesting readahead
> and then it is supposed to retry lookup the page from page cache with
> FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry.
> 
> However, what happens if the fault handler find a page from page
> cache and the page has readahead marker but are waiting under
> writeback? Plus one more condition, it happens under mm_populate
> which repeats faulting unless it encounters error. So let's assemble
> conditions below.
> 
>        CPU 1                                                        CPU 2
> 
> - first loop
>     mm_populate
>      for ()
>        ..
>        ret = populate_vma_page_range
>          __get_user_pages
>            faultin_page
>              handle_mm_fault
>                filemap_fault
>                  do_async_mmap_readahead
>                    if (PageReadahead(pageA))
>                      maybe_unlock_mmap_for_io
>                        up_read(mmap_sem)
> 					                    shrink_page_list
>                                                               pageout
>                                                                 SetPageReclaim(=SetPageReadahead)(pageA)
>                                                                 writepage
>                                                                   SetPageWriteback(pageA)
> 
>                      page_cache_async_readahead()
> 		       ClearPageReadahead(pageA)
>                  do_async_mmap_readahead
> 		 lock_page_maybe_drop_mmap
> 		   goto out_retry
> 
> 					                    the pageA is reclaimed
> 							    and new pageB is populated to the file offset
> 							    and finally has become PG_readahead
> 
> - second loop
> 
> 	  __get_user_pages
>            faultin_page
>              handle_mm_fault
>                filemap_fault
>                  do_async_mmap_readahead
>                    if (PageReadahead(pageB))
>                      maybe_unlock_mmap_for_io
>                        up_read(mmap_sem)
> 					                    shrink_page_list
>                                                               pageout
>                                                                 SetPageReclaim(=SetPageReadahead)(pageB)
>                                                                 writepage
>                                                                   SetPageWriteback(pageB)
> 
>                      page_cache_async_readahead()
> 		       ClearPageReadahead(pageB)
>                  do_async_mmap_readahead
> 		 lock_page_maybe_drop_mmap
> 		   goto out_retry
> 
> It could be repeated forever so it's livelock. Without involving reclaim,
> it could happens if ra_pages become zero by fadvise/other threads who
> have same fd one doing randome while the other one is sequential
> because page_cache_async_readahead has following condition check like
> PageWriteback and ra_pages are never synchrnized with fadvise and
> shrink_readahead_size_eio from other threads.
> 
> page_cache_async_readahead(struct address_space *mapping,
>                            unsigned long req_size)
> {
>         /* no read-ahead */
>         if (!ra->ra_pages)
>                 return;
> 
> Thus, we need to limit fault retry from mm_populate like page
> fault handler.
> 
> Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/gup.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1b521e0ac1de..6f6548c63ad5 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
>   *
>   * This takes care of mlocking the pages too if VM_LOCKED is set.
>   *
> - * return 0 on success, negative error code on error.
> + * return number of pages pinned on success, negative error code on error.
>   *
>   * vma->vm_mm->mmap_sem must be held.
>   *
> @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
>  	struct vm_area_struct *vma = NULL;
>  	int locked = 0;
>  	long ret = 0;
> +	bool tried = false;
>  
>  	end = start + len;
>  
> @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
>  		 * double checks the vma flags, so that it won't mlock pages
>  		 * if the vma was already munlocked.
>  		 */
> -		ret = populate_vma_page_range(vma, nstart, nend, &locked);
> +		ret = populate_vma_page_range(vma, nstart, nend,
> +						tried ? NULL : &locked);
>  		if (ret < 0) {
>  			if (ignore_errors) {
>  				ret = 0;
>  				continue;	/* continue at next VMA */
>  			}
>  			break;
> -		}
> +		} else if (ret == 0)
> +			tried = true;
> +		else
> +			tried = false;
>  		nend = nstart + ret * PAGE_SIZE;
>  		ret = 0;
>  	}
> -- 
> 2.25.0.265.gbab2e86ba0-goog
>

     prev parent reply	other threads:[~2020-03-13 21:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03  0:26 [PATCH v2] mm: fix long time stall from mm_populate Minchan Kim
2020-03-03  1:22 ` Matthew Wilcox
2020-03-03 21:18   ` Minchan Kim
2020-03-13 21:14 ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200313211416.GC78185@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.