From: Sasha Levin <sasha.levin@oracle.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Michel Lespinasse <walken@google.com>,
Dave Jones <davej@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
Bob Liu <lliubbo@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH] thp: close race between split and zap huge pages
Date: Wed, 16 Apr 2014 10:46:47 -0400 [thread overview]
Message-ID: <534E97D7.4060903@oracle.com> (raw)
In-Reply-To: <1397598515-25017-1-git-send-email-kirill.shutemov@linux.intel.com>
On 04/15/2014 05:48 PM, Kirill A. Shutemov wrote:
> Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have
> the same root cause. Let's look to them one by one.
>
> The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".
> It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().
> From my testing I see that page_mapcount() is higher than mapcount here.
>
> I think it happens due to race between zap_huge_pmd() and
> page_check_address_pmd(). page_check_address_pmd() misses PMD
> which is under zap:
>
> CPU0 CPU1
> zap_huge_pmd()
> pmdp_get_and_clear()
> __split_huge_page()
> anon_vma_interval_tree_foreach()
> __split_huge_page_splitting()
> page_check_address_pmd()
> mm_find_pmd()
> /*
> * We check if PMD present without taking ptl: no
> * serialization against zap_huge_pmd(). We miss this PMD,
> * it's not accounted to 'mapcount' in __split_huge_page().
> */
> pmd_present(pmd) == 0
>
> BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
>
> page_remove_rmap(page)
> atomic_add_negative(-1, &page->_mapcount)
>
> The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
> It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
>
> This happens in similar way:
>
> CPU0 CPU1
> zap_huge_pmd()
> pmdp_get_and_clear()
> page_remove_rmap(page)
> atomic_add_negative(-1, &page->_mapcount)
> __split_huge_page()
> anon_vma_interval_tree_foreach()
> __split_huge_page_splitting()
> page_check_address_pmd()
> mm_find_pmd()
> pmd_present(pmd) == 0 /* The same comment as above */
> /*
> * No crash this time since we already decremented page->_mapcount in
> * zap_huge_pmd().
> */
> BUG_ON(mapcount != page_mapcount(page))
>
> /*
> * We split the compound page here into small pages without
> * serialization against zap_huge_pmd()
> */
> __split_huge_page_refcount()
> VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
>
> So my understanding the problem is pmd_present() check in mm_find_pmd()
> without taking page table lock.
>
> The bug was introduced by me commit with commit 117b0791ac42. Sorry for
> that. :(
>
> Let's open code mm_find_pmd() in page_check_address_pmd() and do the
> check under page table lock.
>
> Note that __page_check_address() does the same for PTE entires
> if sync != 0.
>
> I've stress tested split and zap code paths for 36+ hours by now and
> don't see crashes with the patch applied. Before it took <20 min to
> trigger the first bug and few hours for second one (if we ignore
> first).
>
> [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
> [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Cc: <stable@vger.kernel.org> #3.13+
Seems to work for me, thanks!
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sasha.levin@oracle.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Michel Lespinasse <walken@google.com>,
Dave Jones <davej@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
Bob Liu <lliubbo@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH] thp: close race between split and zap huge pages
Date: Wed, 16 Apr 2014 10:46:47 -0400 [thread overview]
Message-ID: <534E97D7.4060903@oracle.com> (raw)
In-Reply-To: <1397598515-25017-1-git-send-email-kirill.shutemov@linux.intel.com>
On 04/15/2014 05:48 PM, Kirill A. Shutemov wrote:
> Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have
> the same root cause. Let's look to them one by one.
>
> The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".
> It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().
> From my testing I see that page_mapcount() is higher than mapcount here.
>
> I think it happens due to race between zap_huge_pmd() and
> page_check_address_pmd(). page_check_address_pmd() misses PMD
> which is under zap:
>
> CPU0 CPU1
> zap_huge_pmd()
> pmdp_get_and_clear()
> __split_huge_page()
> anon_vma_interval_tree_foreach()
> __split_huge_page_splitting()
> page_check_address_pmd()
> mm_find_pmd()
> /*
> * We check if PMD present without taking ptl: no
> * serialization against zap_huge_pmd(). We miss this PMD,
> * it's not accounted to 'mapcount' in __split_huge_page().
> */
> pmd_present(pmd) == 0
>
> BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
>
> page_remove_rmap(page)
> atomic_add_negative(-1, &page->_mapcount)
>
> The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
> It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
>
> This happens in similar way:
>
> CPU0 CPU1
> zap_huge_pmd()
> pmdp_get_and_clear()
> page_remove_rmap(page)
> atomic_add_negative(-1, &page->_mapcount)
> __split_huge_page()
> anon_vma_interval_tree_foreach()
> __split_huge_page_splitting()
> page_check_address_pmd()
> mm_find_pmd()
> pmd_present(pmd) == 0 /* The same comment as above */
> /*
> * No crash this time since we already decremented page->_mapcount in
> * zap_huge_pmd().
> */
> BUG_ON(mapcount != page_mapcount(page))
>
> /*
> * We split the compound page here into small pages without
> * serialization against zap_huge_pmd()
> */
> __split_huge_page_refcount()
> VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
>
> So my understanding the problem is pmd_present() check in mm_find_pmd()
> without taking page table lock.
>
> The bug was introduced by me commit with commit 117b0791ac42. Sorry for
> that. :(
>
> Let's open code mm_find_pmd() in page_check_address_pmd() and do the
> check under page table lock.
>
> Note that __page_check_address() does the same for PTE entires
> if sync != 0.
>
> I've stress tested split and zap code paths for 36+ hours by now and
> don't see crashes with the patch applied. Before it took <20 min to
> trigger the first bug and few hours for second one (if we ignore
> first).
>
> [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
> [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Cc: <stable@vger.kernel.org> #3.13+
Seems to work for me, thanks!
Thanks,
Sasha
next prev parent reply other threads:[~2014-04-16 14:48 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-15 21:48 [PATCH] thp: close race between split and zap huge pages Kirill A. Shutemov
2014-04-15 21:48 ` Kirill A. Shutemov
2014-04-16 14:46 ` Sasha Levin [this message]
2014-04-16 14:46 ` Sasha Levin
2014-04-16 20:19 ` Andrew Morton
2014-04-16 20:19 ` Andrew Morton
2014-04-18 20:56 ` Kirill A. Shutemov
2014-04-18 20:56 ` Kirill A. Shutemov
-- strict thread matches above, loose matches on Subject: below --
2014-04-15 21:48 Kirill A. Shutemov
2014-04-15 21:48 ` Kirill A. Shutemov
2014-04-15 23:52 ` Bob Liu
2014-04-15 23:52 ` Bob Liu
2014-04-16 8:42 ` Kirill A. Shutemov
2014-04-17 0:28 ` Bob Liu
2014-04-17 0:28 ` Bob Liu
2014-04-17 20:16 ` Andrea Arcangeli
2014-04-17 20:16 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534E97D7.4060903@oracle.com \
--to=sasha.levin@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lliubbo@gmail.com \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.