* Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
[not found] <20120927215147.A3F935C0050@hpza9.eem.corp.google.com>
@ 2012-09-27 23:40 ` Linus Torvalds
2012-09-28 12:07 ` Andrea Arcangeli
0 siblings, 1 reply; 2+ messages in thread
From: Linus Torvalds @ 2012-09-27 23:40 UTC (permalink / raw)
To: akpm
Cc: aarcange, hughd, jweiner, mgorman, pholasek, riel, David Rientjes,
Naoya Horiguchi, KAMEZAWA Hiroyuki, linux-mm
On Thu, Sep 27, 2012 at 2:51 PM, <akpm@linux-foundation.org> wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
>
> Some time ago Petr once reproduced a false positive VM_BUG_ON in
> khugepaged while running the autonuma-benchmark on a large 8 node system.
> All production kernels out there have DEBUG_VM=n so it was only noticeable
> on self built kernels. It's not easily reproducible even on the 8 nodes
> system.
>
> Use page_freeze_refs to prevent speculative pagecache lookups to
> trigger the false positives, so we're still able to check the
> page_count to be exact.
This is too ugly to live. It also fundamentally changes semantics and
actually makes CONFIG_DEBUG_VM result in totally different behavior.
I really don't think it's a good feature to make CONFIG_DEBUG_VM
actually seriously change serialization.
Either do the page_freeze_refs thing *unconditionally*, presumably
replacing the current code that does
...
/* cannot use mapcount: can't collapse if there's a gup pin */
if (page_count(page) != 1) {
instead, or then just relax the potentially racy VM_BUG_ON() to just
check >= 2. Because debug stuff that changes semantics really is
horribly horribly bad.
Btw, there are two other thp patches (relating to mlock) floating
around. They look much more reasonable than this one, but I was hoping
to see more ack's for them.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
2012-09-27 23:40 ` [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy Linus Torvalds
@ 2012-09-28 12:07 ` Andrea Arcangeli
0 siblings, 0 replies; 2+ messages in thread
From: Andrea Arcangeli @ 2012-09-28 12:07 UTC (permalink / raw)
To: Linus Torvalds
Cc: akpm, hughd, jweiner, mgorman, pholasek, riel, David Rientjes,
Naoya Horiguchi, KAMEZAWA Hiroyuki, linux-mm
Hi Linus,
On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote:
> On Thu, Sep 27, 2012 at 2:51 PM, <akpm@linux-foundation.org> wrote:
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
> >
> > Some time ago Petr once reproduced a false positive VM_BUG_ON in
> > khugepaged while running the autonuma-benchmark on a large 8 node system.
> > All production kernels out there have DEBUG_VM=n so it was only noticeable
> > on self built kernels. It's not easily reproducible even on the 8 nodes
> > system.
> >
> > Use page_freeze_refs to prevent speculative pagecache lookups to
> > trigger the false positives, so we're still able to check the
> > page_count to be exact.
>
> This is too ugly to live. It also fundamentally changes semantics and
> actually makes CONFIG_DEBUG_VM result in totally different behavior.
>
> I really don't think it's a good feature to make CONFIG_DEBUG_VM
> actually seriously change serialization.
>
> Either do the page_freeze_refs thing *unconditionally*, presumably
> replacing the current code that does
>
> ...
> /* cannot use mapcount: can't collapse if there's a gup pin */
> if (page_count(page) != 1) {
>
> instead, or then just relax the potentially racy VM_BUG_ON() to just
> check >= 2. Because debug stuff that changes semantics really is
> horribly horribly bad.
Agreed. I think we can simply drop that VM_BUG_ON: there's a
putback_lru_page that does a put_page, and then another that does
page_cache_release just after the VM_BUG_ON, so if the count is < 2
we'll notice when the count underflows in free_page_and_swap_cache
(we've a VM_BUG_ON in put_page_testzero for that).
The reason for too ugly to live patch, is that I wanted to take the
opportunity of a workload on a 128 CPU 8 node system that could
reproduce it to verify that it really was the speculative lookup and
not something else (and something else wouldn't have been good :). But
I fully agree with you that after successfully verifying that, it's
good to go with a non-ugly version. Thanks for checking this!
I'll send an updated patch.
> Btw, there are two other thp patches (relating to mlock) floating
> around. They look much more reasonable than this one, but I was hoping
> to see more ack's for them.
Ok! I'll review them too. I read them and they looked all right but I
didn't spend enough time on it yet to be sure and send an ack sorry.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-09-28 12:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20120927215147.A3F935C0050@hpza9.eem.corp.google.com>
2012-09-27 23:40 ` [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy Linus Torvalds
2012-09-28 12:07 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).