From: Andrea Arcangeli <aarcange@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [BUG] vfio device assignment regression with THP ref counting redesign
Date: Fri, 29 Apr 2016 01:21:27 +0200 [thread overview]
Message-ID: <20160428232127.GL11700@redhat.com> (raw)
In-Reply-To: <20160428125808.29ad59e5@t450s.home>
Hello Alex and Kirill,
On Thu, Apr 28, 2016 at 12:58:08PM -0600, Alex Williamson wrote:
> > > specific fix to this code is not applicable. It also still occurs on
> > > kernels as recent as v4.6-rc5, so the issue hasn't been silently fixed
> > > yet. I'm able to reproduce this fairly quickly with the above test,
> > > but it's not hard to imagine a test w/o any iommu dependencies which
> > > simply does a user directed get_user_pages_fast() on a set of userspace
> > > addresses, retains the reference, and at some point later rechecks that
> > > a new get_user_pages_fast() results in the same page address. It
Can you try to "git revert 1f25fe20a76af0d960172fb104d4b13697cafa84"
and then apply the below patch on top of the revert?
Totally untested... if I missed something and it isn't correct, I hope
this brings us in the right direction faster at least.
Overall the problem I think is that we need to restore full accuracy
and we can't deal with false positive COWs (which aren't entirely
cheap either... reading 512 cachelines should be much faster than
copying 2MB and using 4MB of CPU cache). 32k vs 4MB. The problem of
course is when we really need a COW, we'll waste an additional 32k,
but then it doesn't matter that much as we'd be forced to load 4MB of
cache anyway in such case. There's room for optimizations but even the
simple below patch would be ok for now.
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [BUG] vfio device assignment regression with THP ref counting redesign
Date: Fri, 29 Apr 2016 01:21:27 +0200 [thread overview]
Message-ID: <20160428232127.GL11700@redhat.com> (raw)
In-Reply-To: <20160428125808.29ad59e5@t450s.home>
Hello Alex and Kirill,
On Thu, Apr 28, 2016 at 12:58:08PM -0600, Alex Williamson wrote:
> > > specific fix to this code is not applicable. It also still occurs on
> > > kernels as recent as v4.6-rc5, so the issue hasn't been silently fixed
> > > yet. I'm able to reproduce this fairly quickly with the above test,
> > > but it's not hard to imagine a test w/o any iommu dependencies which
> > > simply does a user directed get_user_pages_fast() on a set of userspace
> > > addresses, retains the reference, and at some point later rechecks that
> > > a new get_user_pages_fast() results in the same page address. It
Can you try to "git revert 1f25fe20a76af0d960172fb104d4b13697cafa84"
and then apply the below patch on top of the revert?
Totally untested... if I missed something and it isn't correct, I hope
this brings us in the right direction faster at least.
Overall the problem I think is that we need to restore full accuracy
and we can't deal with false positive COWs (which aren't entirely
cheap either... reading 512 cachelines should be much faster than
copying 2MB and using 4MB of CPU cache). 32k vs 4MB. The problem of
course is when we really need a COW, we'll waste an additional 32k,
but then it doesn't matter that much as we'd be forced to load 4MB of
cache anyway in such case. There's room for optimizations but even the
simple below patch would be ok for now.
>From 09e3d1ff10b49fb9c3ab77f0b96a862848e30067 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 29 Apr 2016 01:05:06 +0200
Subject: [PATCH 1/1] mm: thp: calculate page_mapcount() correctly for THP
pages
This allows to revert commit 1f25fe20a76af0d960172fb104d4b13697cafa84
and it provides fully accuracy with wrprotect faults so page pinning
will stop causing false positive copy-on-writes.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
mm/util.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 6cc81e7..a0b9f63 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -383,9 +383,10 @@ struct address_space *page_mapping(struct page *page)
/* Slow path of page_mapcount() for compound pages */
int __page_mapcount(struct page *page)
{
- int ret;
+ int ret = 0, i;
- ret = atomic_read(&page->_mapcount) + 1;
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ ret = max(ret, atomic_read(&page->_mapcount) + 1);
page = compound_head(page);
ret += atomic_read(compound_mapcount_ptr(page)) + 1;
if (PageDoubleMap(page))
next prev parent reply other threads:[~2016-04-28 23:21 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-28 16:20 [BUG] vfio device assignment regression with THP ref counting redesign Alex Williamson
2016-04-28 16:20 ` Alex Williamson
2016-04-28 18:17 ` Kirill A. Shutemov
2016-04-28 18:17 ` Kirill A. Shutemov
2016-04-28 18:58 ` Alex Williamson
2016-04-28 18:58 ` Alex Williamson
2016-04-28 23:21 ` Andrea Arcangeli [this message]
2016-04-28 23:21 ` Andrea Arcangeli
2016-04-29 0:44 ` Alex Williamson
2016-04-29 0:44 ` Alex Williamson
2016-04-29 0:51 ` Kirill A. Shutemov
2016-04-29 0:51 ` Kirill A. Shutemov
2016-04-29 2:45 ` Alex Williamson
2016-04-29 2:45 ` Alex Williamson
2016-04-29 7:06 ` Kirill A. Shutemov
2016-04-29 7:06 ` Kirill A. Shutemov
2016-04-29 15:12 ` Alex Williamson
2016-04-29 15:12 ` Alex Williamson
2016-04-29 16:34 ` Andrea Arcangeli
2016-04-29 16:34 ` Andrea Arcangeli
2016-04-29 22:34 ` Alex Williamson
2016-04-29 22:34 ` Alex Williamson
2016-05-02 10:41 ` Kirill A. Shutemov
2016-05-02 10:41 ` Kirill A. Shutemov
2016-05-02 11:15 ` Jerome Glisse
2016-05-02 11:15 ` Jerome Glisse
2016-05-02 12:14 ` GUP guarantees wrt to userspace mappings redesign Kirill A. Shutemov
2016-05-02 12:14 ` Kirill A. Shutemov
2016-05-02 13:39 ` Jerome Glisse
2016-05-02 13:39 ` Jerome Glisse
2016-05-02 15:00 ` GUP guarantees wrt to userspace mappings Kirill A. Shutemov
2016-05-02 15:00 ` Kirill A. Shutemov
2016-05-02 15:22 ` Jerome Glisse
2016-05-02 15:22 ` Jerome Glisse
2016-05-02 16:12 ` Kirill A. Shutemov
2016-05-02 16:12 ` Kirill A. Shutemov
2016-05-02 19:14 ` Andrea Arcangeli
2016-05-02 19:14 ` Andrea Arcangeli
2016-05-02 19:11 ` Andrea Arcangeli
2016-05-02 19:11 ` Andrea Arcangeli
2016-05-02 19:02 ` Andrea Arcangeli
2016-05-02 19:02 ` Andrea Arcangeli
2016-05-02 14:15 ` GUP guarantees wrt to userspace mappings redesign Oleg Nesterov
2016-05-02 14:15 ` Oleg Nesterov
2016-05-02 16:21 ` Kirill A. Shutemov
2016-05-02 16:21 ` Kirill A. Shutemov
2016-05-02 16:22 ` Oleg Nesterov
2016-05-02 16:22 ` Oleg Nesterov
2016-05-02 18:03 ` Kirill A. Shutemov
2016-05-02 18:03 ` Kirill A. Shutemov
2016-05-02 17:41 ` Oleg Nesterov
2016-05-02 17:41 ` Oleg Nesterov
2016-05-02 18:56 ` Andrea Arcangeli
2016-05-02 18:56 ` Andrea Arcangeli
2016-05-02 15:23 ` [BUG] vfio device assignment regression with THP ref counting redesign Andrea Arcangeli
2016-05-02 15:23 ` Andrea Arcangeli
2016-05-02 16:00 ` Kirill A. Shutemov
2016-05-02 16:00 ` Kirill A. Shutemov
2016-05-02 18:03 ` Andrea Arcangeli
2016-05-02 18:03 ` Andrea Arcangeli
2016-05-05 1:19 ` Alex Williamson
2016-05-05 1:19 ` Alex Williamson
2016-05-05 14:39 ` Andrea Arcangeli
2016-05-05 14:39 ` Andrea Arcangeli
2016-05-05 15:09 ` Andrea Arcangeli
2016-05-05 15:09 ` Andrea Arcangeli
2016-05-05 15:11 ` Kirill A. Shutemov
2016-05-05 15:11 ` Kirill A. Shutemov
2016-05-05 15:24 ` Andrea Arcangeli
2016-05-05 15:24 ` Andrea Arcangeli
2016-05-06 7:29 ` Kirill A. Shutemov
2016-05-06 7:29 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160428232127.GL11700@redhat.com \
--to=aarcange@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.