From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <lnux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [PATCH v2] mm/hugetlbfs Fix bugs in fallocate hole punch of areas with holes
Date: Tue, 10 Nov 2015 19:39:45 -0800 [thread overview]
Message-ID: <5642B881.4030401@oracle.com> (raw)
In-Reply-To: <20151111025802.GA18535@hori1.linux.bs1.fc.nec.co.jp>
On 11/10/2015 06:58 PM, Naoya Horiguchi wrote:
> Hello Mike,
>
> On Tue, Nov 10, 2015 at 05:38:01PM -0800, Mike Kravetz wrote:
>> This is against linux-stable 4.3. Will send to stable@vger.kernel.org
>> when Ack'ed here.
>
> This is not what stable stuff works, please see
> Documentation/stable_kernel_rules.txt.
Ok. I'll resend with the Cc.
>
>> Hugh Dickins pointed out problems with the new hugetlbfs fallocate
>> hole punch code. These problems are in the routine remove_inode_hugepages
>> and mostly occur in the case where there are holes in the range of
>> pages to be removed. These holes could be the result of a previous hole
>> punch or simply sparse allocation.
>>
>> remove_inode_hugepages handles both hole punch and truncate operations.
>> Page index handling was fixed/cleaned up so that the loop index always
>> matches the page being processed. The code now only makes a single pass
>> through the range of pages as it was determined page faults could not
>> race with truncate. A cond_resched() was added after removing up to
>> PAGEVEC_SIZE pages.
>>
>> Some totally unnecessary code in hugetlbfs_fallocate() that remained from
>> early development was also removed.
>>
>> v2:
>> Make remove_inode_hugepages simpler after verifying truncate can not
>> race with page faults here.
>>
>> Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
>
> Cc: stable@vger.kernel.org [4.3]
Will add.
>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> fs/hugetlbfs/inode.c | 57 ++++++++++++++++++++++++++--------------------------
>> 1 file changed, 29 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 316adb9..8290f39 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -332,12 +332,15 @@ static void remove_huge_page(struct page *page)
>> * truncation is indicated by end of range being LLONG_MAX
>> * In this case, we first scan the range and release found pages.
>> * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
>> - * maps and global counts.
>> + * maps and global counts. Page faults can not race with truncation
>> + * in this routine. hugetlb_no_page() prevents page faults in the
>> + * truncated range.
>
> Could you be specific about how/why? Maybe hugetlb_fault_mutex_hash and/or
> i_size check should be mentioned, because it's not so obvious.
The long explanation is here:
http://marc.info/?l=linux-mm&m=144719585221409&w=2
I will include a brief summary here.
>
>> * hole punch is indicated if end is not LLONG_MAX
>> * In the hole punch case we scan the range and release found pages.
>> * Only when releasing a page is the associated region/reserv map
>> * deleted. The region/reserv map for ranges without associated
>> - * pages are not modified.
>> + * pages are not modified. Page faults can race with hole punch.
>> + * This is indicated if we find a mapped page.
>> * Note: If the passed end of range value is beyond the end of file, but
>> * not LLONG_MAX this routine still performs a hole punch operation.
>> */
>> @@ -361,44 +364,38 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>> next = start;
>> while (next < end) {
>> /*
>> - * Make sure to never grab more pages that we
>> - * might possibly need.
>> + * Don't grab more pages than the number left in the range.
>> */
>> if (end - next < lookup_nr)
>> lookup_nr = end - next;
>>
>> /*
>> - * This pagevec_lookup() may return pages past 'end',
>> - * so we must check for page->index > end.
>> + * When no more pages are found, we are done.
>> */
>> - if (!pagevec_lookup(&pvec, mapping, next, lookup_nr)) {
>> - if (next == start)
>> - break;
>> - next = start;
>> - continue;
>> - }
>> + if (!pagevec_lookup(&pvec, mapping, next, lookup_nr))
>> + break;
>>
>> for (i = 0; i < pagevec_count(&pvec); ++i) {
>> struct page *page = pvec.pages[i];
>> u32 hash;
>>
>> + /*
>> + * The page (index) could be beyond end. This is
>> + * only possible in the punch hole case as end is
>> + * max page offset in the truncate case.
>> + */
>> + next = page->index;
>> + if (next >= end)
>> + break;
>> +
>> hash = hugetlb_fault_mutex_hash(h, current->mm,
>> &pseudo_vma,
>> mapping, next, 0);
>> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>>
>> lock_page(page);
>> - if (page->index >= end) {
>> - unlock_page(page);
>> - mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>> - next = end; /* we are done */
>> - break;
>> - }
>> -
>> /*
>> - * If page is mapped, it was faulted in after being
>> - * unmapped. Do nothing in this race case. In the
>> - * normal case page is not mapped.
>> + * In the normal case the page is not mapped.
>> */
>> if (!page_mapped(page)) {
>
> I feel that doing like "likely(!page_mapped(page))" without comment is enough
> and self-descriptive.
>
Ok, makes sense
>> bool rsv_on_error = !PagePrivate(page);
>> @@ -421,17 +418,24 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>> hugetlb_fix_reserve_counts(
>> inode, rsv_on_error);
>> }
>> + } else {
>> + /*
>> + * If page is mapped, it was faulted in after
>> + * being unmapped. It indicates a race between
>> + * hole punch and page fault. Do nothing in
>> + * this case. Getting here in a truncate
>> + * operation is a bug.
>> + */
>> + BUG_ON(truncate_op);
>> }
>>
>> - if (page->index > next)
>> - next = page->index;
>> -
>> ++next;
>
> My comment was ignored for some reason?
> http://marc.info/?l=linux-kernel&m=144705235903057&w=2
My apologies. I somehow overlooked that e-mail. It was not my intention
to ignore your comments.
>From that comment, I agree than the ++next should be moved outside
the for look.
>
> Anyway, I think the patch's idea is OK, so
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Thanks for your comments. I'll respin shortly and incorporate your
comments.
--
Mike Kravetz
>
> Thanks,
> Naoya Horiguchi
>
>> unlock_page(page);
>>
>> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>> }
>> huge_pagevec_release(&pvec);
>> + cond_resched();
>> }
>>
>> if (truncate_op)
>> @@ -647,9 +651,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>> if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
>> i_size_write(inode, offset + len);
>> inode->i_ctime = CURRENT_TIME;
>> - spin_lock(&inode->i_lock);
>> - inode->i_private = NULL;
>> - spin_unlock(&inode->i_lock);
>> out:
>> mutex_unlock(&inode->i_mutex);
>> return error;
>> --
>> 2.4.3
next prev parent reply other threads:[~2015-11-11 3:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-11 1:38 [PATCH v2] mm/hugetlbfs Fix bugs in fallocate hole punch of areas with holes Mike Kravetz
2015-11-11 1:38 ` Mike Kravetz
2015-11-11 2:58 ` Naoya Horiguchi
2015-11-11 3:39 ` Mike Kravetz [this message]
-- strict thread matches above, loose matches on Subject: below --
2015-11-11 1:38 Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5642B881.4030401@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lnux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.