All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Song Liu <songliubraving@fb.com>,
	linux-mm@kvack.org, kernel-team@fb.com,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] mm/thp: flush file for !is_shmem PageDirty() case in collapse_file()
Date: Fri, 1 Nov 2019 14:31:37 -0400	[thread overview]
Message-ID: <20191101183137.GA16977@cmpxchg.org> (raw)
In-Reply-To: <20191030185115.ad780e74c66b6286789559fd@linux-foundation.org>

On Wed, Oct 30, 2019 at 06:51:15PM -0700, Andrew Morton wrote:
> To that end, here's the combination of
> mmthp-recheck-each-page-before-collapsing-file-thp.patch and this
> patch:
> 
> From: Song Liu <songliubraving@fb.com>
> Subject: mm,thp: recheck each page before collapsing file THP
> 
> In collapse_file(), for !is_shmem case, current check cannot guarantee
> the locked page is up-to-date.  Specifically, xas_unlock_irq() should
> not be called before lock_page() and get_page(); and it is necessary to
> recheck PageUptodate() after locking the page.  
> 
> With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
> may contain corrupted data.  This is because khugepaged mistakenly
> collapses some not up-to-date sub pages into a huge page, and assumes
> the huge page is up-to-date.  This will NOT corrupt data in the disk,
> because the page is read-only and never written back.  Fix this by
> properly checking PageUptodate() after locking the page.  This check
> replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".  
> 
> Also, move PageDirty() check after locking the page.  Current
> khugepaged should not try to collapse dirty file THP, because it is
> limited to read-only .text.  Add a warning with the PageDirty() check
> as it should not happen.  This warning is added after page_mapping()
> check, because if the page is truncated, it might be dirty.
> 
> [songliubraving@fb.com: flush file for !is_shmem PageDirty() case in collapse_file()]
>   Link: http://lkml.kernel.org/r/20191030200736.3455046-1-songliubraving@fb.com
> [songliubraving@fb.com: v4]
>   Link: http://lkml.kernel.org/r/20191022191006.411277-1-songliubraving@fb.com
> [songliubraving@fb.com: fix deadlock in collapse_file()]
>   Link: http://lkml.kernel.org/r/20191028221414.3685035-1-songliubraving@fb.com
> Link: http://lkml.kernel.org/r/20191018180345.4188310-1-songliubraving@fb.com
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: William Kucharski <william.kucharski@oracle.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/khugepaged.c |   49 +++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 40 insertions(+), 9 deletions(-)
> 
> --- a/mm/khugepaged.c~mmthp-recheck-each-page-before-collapsing-file-thp
> +++ a/mm/khugepaged.c
> @@ -1601,17 +1601,33 @@ static void collapse_file(struct mm_stru
>  					result = SCAN_FAIL;
>  					goto xa_unlocked;
>  				}
> -			} else if (!PageUptodate(page)) {
> -				xas_unlock_irq(&xas);
> -				wait_on_page_locked(page);
> -				if (!trylock_page(page)) {
> -					result = SCAN_PAGE_LOCK;
> -					goto xa_unlocked;
> +				if (WARN_ON_ONCE(PageDirty(page))) {
> +					/*
> +					 * page from readahead should not
> +					 * be dirty. Show warning if this
> +					 * somehow happens.
> +					 */
> +					result = SCAN_FAIL;
> +					goto out_unlock;

I'm not sure we need this check here. We have a dirty check on the
locked page below, and it's not clear why there is a suspicion of
readahead producing dirty pages...?

>  				}
> -				get_page(page);
>  			} else if (PageDirty(page)) {
> +				/*
> +				 * khugepaged only works on read-only fd,
> +				 * so this page is dirty because it hasn't
> +				 * been flushed since first write. There
> +				 * won't be new dirty pages.
> +				 *
> +				 * Trigger async flush here and hope the
> +				 * writeback is done when khugepaged
> +				 * revisits this page.
> +				 *
> +				 * This is a one-off situation. We are not
> +				 * forcing writeback in loop.
> +				 */
> +				xas_unlock_irq(&xas);
> +				filemap_flush(mapping);
>  				result = SCAN_FAIL;
> -				goto xa_locked;
> +				goto xa_unlocked;

We should do this to improve the file THP success rate. But we
shouldn't conflate it with the data corruption bug fixed in the hunks
below:

>  			} else if (trylock_page(page)) {
>  				get_page(page);
>  				xas_unlock_irq(&xas);
> @@ -1626,7 +1642,12 @@ static void collapse_file(struct mm_stru
>  		 * without racing with truncate.
>  		 */
>  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> -		VM_BUG_ON_PAGE(!PageUptodate(page), page);
> +
> +		/* double check the page is up to date */
> +		if (unlikely(!PageUptodate(page))) {
> +			result = SCAN_FAIL;
> +			goto out_unlock;
> +		}

This...

> @@ -1642,6 +1663,16 @@ static void collapse_file(struct mm_stru
>  			goto out_unlock;
>  		}
>  
> +		if (!is_shmem && PageDirty(page)) {
> +			/*
> +			 * khugepaged only works on read-only fd, so this
> +			 * page is dirty because it hasn't been flushed
> +			 * since first write.
> +			 */
> +			result = SCAN_FAIL;
> +			goto out_unlock;
> +		}

...and this are pretty urgent chunks because they protect the
integrity of the page cache. They should go in asap.

So I suggest we do two patches instead:

1. These two locked page checks to fix the corruption in 5.4
2. The filemap_flush() bit to improve THP coverage in 5.5


  parent reply	other threads:[~2019-11-01 18:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-30 20:07 [PATCH] mm/thp: flush file for !is_shmem PageDirty() case in collapse_file() Song Liu
2019-10-31  1:51 ` Andrew Morton
2019-10-31  4:55   ` Song Liu
2019-10-31 18:10   ` Song Liu
2019-11-01 18:31   ` Johannes Weiner [this message]
2019-11-01 21:48     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101183137.GA16977@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kernel-team@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=songliubraving@fb.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.