* Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
2012-09-27 23:40 ` [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy Linus Torvalds
@ 2012-09-28 12:07 ` Andrea Arcangeli
0 siblings, 0 replies; 2+ messages in thread
From: Andrea Arcangeli @ 2012-09-28 12:07 UTC (permalink / raw)
To: Linus Torvalds
Cc: akpm, hughd, jweiner, mgorman, pholasek, riel, David Rientjes,
Naoya Horiguchi, KAMEZAWA Hiroyuki, linux-mm
Hi Linus,
On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote:
> On Thu, Sep 27, 2012 at 2:51 PM, <akpm@linux-foundation.org> wrote:
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
> >
> > Some time ago Petr once reproduced a false positive VM_BUG_ON in
> > khugepaged while running the autonuma-benchmark on a large 8 node system.
> > All production kernels out there have DEBUG_VM=n so it was only noticeable
> > on self built kernels. It's not easily reproducible even on the 8 nodes
> > system.
> >
> > Use page_freeze_refs to prevent speculative pagecache lookups to
> > trigger the false positives, so we're still able to check the
> > page_count to be exact.
>
> This is too ugly to live. It also fundamentally changes semantics and
> actually makes CONFIG_DEBUG_VM result in totally different behavior.
>
> I really don't think it's a good feature to make CONFIG_DEBUG_VM
> actually seriously change serialization.
>
> Either do the page_freeze_refs thing *unconditionally*, presumably
> replacing the current code that does
>
> ...
> /* cannot use mapcount: can't collapse if there's a gup pin */
> if (page_count(page) != 1) {
>
> instead, or then just relax the potentially racy VM_BUG_ON() to just
> check >= 2. Because debug stuff that changes semantics really is
> horribly horribly bad.
Agreed. I think we can simply drop that VM_BUG_ON: there's a
putback_lru_page that does a put_page, and then another that does
page_cache_release just after the VM_BUG_ON, so if the count is < 2
we'll notice when the count underflows in free_page_and_swap_cache
(we've a VM_BUG_ON in put_page_testzero for that).
The reason for too ugly to live patch, is that I wanted to take the
opportunity of a workload on a 128 CPU 8 node system that could
reproduce it to verify that it really was the speculative lookup and
not something else (and something else wouldn't have been good :). But
I fully agree with you that after successfully verifying that, it's
good to go with a non-ugly version. Thanks for checking this!
I'll send an updated patch.
> Btw, there are two other thp patches (relating to mlock) floating
> around. They look much more reasonable than this one, but I was hoping
> to see more ack's for them.
Ok! I'll review them too. I read them and they looked all right but I
didn't spend enough time on it yet to be sure and send an ack sorry.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread