All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch
Date: Mon, 11 Jan 2016 16:29:31 -0800	[thread overview]
Message-ID: <20160111162931.0bea916e.akpm@linux-foundation.org> (raw)
In-Reply-To: <56943D00.7090405@oracle.com>

On Mon, 11 Jan 2016 15:38:40 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> On 01/11/2016 02:35 PM, Andrew Morton wrote:
> > On Wed,  6 Jan 2016 14:37:04 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > 
> >> Page faults can race with fallocate hole punch.  If a page fault happens
> >> between the unmap and remove operations, the page is not removed and
> >> remains within the hole.  This is not the desired behavior.  The race
> >> is difficult to detect in user level code as even in the non-race
> >> case, a page within the hole could be faulted back in before fallocate
> >> returns.  If userfaultfd is expanded to support hugetlbfs in the future,
> >> this race will be easier to observe.
> >>
> >> If this race is detected and a page is mapped, the remove operation
> >> (remove_inode_hugepages) will unmap the page before removing.  The unmap
> >> within remove_inode_hugepages occurs with the hugetlb_fault_mutex held
> >> so that no other faults will be processed until the page is removed.
> >>
> >> The (unmodified) routine hugetlb_vmdelete_list was moved ahead of
> >> remove_inode_hugepages to satisfy the new reference.
> >>
> >> ...
> >>
> >> --- a/fs/hugetlbfs/inode.c
> >> +++ b/fs/hugetlbfs/inode.c
> >>
> >> ...
> >>
> >> @@ -395,37 +431,43 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> >>  							mapping, next, 0);
> >>  			mutex_lock(&hugetlb_fault_mutex_table[hash]);
> >>  
> >> -			lock_page(page);
> >> -			if (likely(!page_mapped(page))) {
> > 
> > hm, what are the locking requirements for page_mapped()?
> 
> page_mapped is just reading/evaluating an atomic within the struct page
> which we have a referene on from the pagevec_lookup.  But, I think the
> question is what prevents page_mapped from changing after we check it?
> 
> The patch takes the hugetlb_fault_mutex_table lock before checking
> page_mapped.  If the page is unmapped and the hugetlb_fault_mutex_table
> is held, it can not be faulted in and change from unmapped to mapped.
> 
> The new comment in the patch about taking hugetlb_fault_mutex_table is
> right before the check for page_mapped.

OK, thanks.

> > 
> >> -				bool rsv_on_error = !PagePrivate(page);
> >> -				/*
> >> -				 * We must free the huge page and remove
> >> -				 * from page cache (remove_huge_page) BEFORE
> >> -				 * removing the region/reserve map
> >> -				 * (hugetlb_unreserve_pages).  In rare out
> >> -				 * of memory conditions, removal of the
> >> -				 * region/reserve map could fail.  Before
> >> -				 * free'ing the page, note PagePrivate which
> >> -				 * is used in case of error.
> >> -				 */
> >> -				remove_huge_page(page);
> > 
> > And remove_huge_page().
> 
> The page must be locked before calling remove_huge_page, since it will
> call delete_from_page_cache.  It currently is locked.  Would you prefer
> a comment stating this before the call?

No, that doesn't seem nevessary.

I'll mark this patch as "pending, awaiting Mike's go-ahead".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch
Date: Mon, 11 Jan 2016 16:29:31 -0800	[thread overview]
Message-ID: <20160111162931.0bea916e.akpm@linux-foundation.org> (raw)
In-Reply-To: <56943D00.7090405@oracle.com>

On Mon, 11 Jan 2016 15:38:40 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> On 01/11/2016 02:35 PM, Andrew Morton wrote:
> > On Wed,  6 Jan 2016 14:37:04 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > 
> >> Page faults can race with fallocate hole punch.  If a page fault happens
> >> between the unmap and remove operations, the page is not removed and
> >> remains within the hole.  This is not the desired behavior.  The race
> >> is difficult to detect in user level code as even in the non-race
> >> case, a page within the hole could be faulted back in before fallocate
> >> returns.  If userfaultfd is expanded to support hugetlbfs in the future,
> >> this race will be easier to observe.
> >>
> >> If this race is detected and a page is mapped, the remove operation
> >> (remove_inode_hugepages) will unmap the page before removing.  The unmap
> >> within remove_inode_hugepages occurs with the hugetlb_fault_mutex held
> >> so that no other faults will be processed until the page is removed.
> >>
> >> The (unmodified) routine hugetlb_vmdelete_list was moved ahead of
> >> remove_inode_hugepages to satisfy the new reference.
> >>
> >> ...
> >>
> >> --- a/fs/hugetlbfs/inode.c
> >> +++ b/fs/hugetlbfs/inode.c
> >>
> >> ...
> >>
> >> @@ -395,37 +431,43 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> >>  							mapping, next, 0);
> >>  			mutex_lock(&hugetlb_fault_mutex_table[hash]);
> >>  
> >> -			lock_page(page);
> >> -			if (likely(!page_mapped(page))) {
> > 
> > hm, what are the locking requirements for page_mapped()?
> 
> page_mapped is just reading/evaluating an atomic within the struct page
> which we have a referene on from the pagevec_lookup.  But, I think the
> question is what prevents page_mapped from changing after we check it?
> 
> The patch takes the hugetlb_fault_mutex_table lock before checking
> page_mapped.  If the page is unmapped and the hugetlb_fault_mutex_table
> is held, it can not be faulted in and change from unmapped to mapped.
> 
> The new comment in the patch about taking hugetlb_fault_mutex_table is
> right before the check for page_mapped.

OK, thanks.

> > 
> >> -				bool rsv_on_error = !PagePrivate(page);
> >> -				/*
> >> -				 * We must free the huge page and remove
> >> -				 * from page cache (remove_huge_page) BEFORE
> >> -				 * removing the region/reserve map
> >> -				 * (hugetlb_unreserve_pages).  In rare out
> >> -				 * of memory conditions, removal of the
> >> -				 * region/reserve map could fail.  Before
> >> -				 * free'ing the page, note PagePrivate which
> >> -				 * is used in case of error.
> >> -				 */
> >> -				remove_huge_page(page);
> > 
> > And remove_huge_page().
> 
> The page must be locked before calling remove_huge_page, since it will
> call delete_from_page_cache.  It currently is locked.  Would you prefer
> a comment stating this before the call?

No, that doesn't seem nevessary.

I'll mark this patch as "pending, awaiting Mike's go-ahead".

  reply	other threads:[~2016-01-12  0:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 22:37 [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch Mike Kravetz
2016-01-06 22:37 ` Mike Kravetz
2016-01-07  8:06 ` Hillf Danton
2016-01-07  8:06   ` Hillf Danton
2016-01-07 16:46   ` Mike Kravetz
2016-01-07 16:46     ` Mike Kravetz
2016-01-08  4:39   ` Mike Kravetz
2016-01-08  4:39     ` Mike Kravetz
2016-01-08  6:25     ` Hillf Danton
2016-01-08  6:25       ` Hillf Danton
2016-01-11 22:35 ` Andrew Morton
2016-01-11 22:35   ` Andrew Morton
2016-01-11 23:38   ` Mike Kravetz
2016-01-11 23:38     ` Mike Kravetz
2016-01-12  0:29     ` Andrew Morton [this message]
2016-01-12  0:29       ` Andrew Morton
2016-01-12  1:36       ` Mike Kravetz
2016-01-12  1:36         ` Mike Kravetz
2016-01-12  2:20         ` Andrew Morton
2016-01-12  2:20           ` Andrew Morton
2016-01-12  3:21           ` Mike Kravetz
2016-01-12  3:21             ` Mike Kravetz
2016-01-12  4:35             ` Andrew Morton
2016-01-12  4:35               ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160111162931.0bea916e.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.