From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161726AbaDPOrH (ORCPT ); Wed, 16 Apr 2014 10:47:07 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:26658 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161422AbaDPOrD (ORCPT ); Wed, 16 Apr 2014 10:47:03 -0400 Message-ID: <534E97D7.4060903@oracle.com> Date: Wed, 16 Apr 2014 10:46:47 -0400 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: "Kirill A. Shutemov" , Andrea Arcangeli , Andrew Morton CC: Rik van Riel , Mel Gorman , Michel Lespinasse , Dave Jones , Vlastimil Babka , Bob Liu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] thp: close race between split and zap huge pages References: <1397598515-25017-1-git-send-email-kirill.shutemov@linux.intel.com> In-Reply-To: <1397598515-25017-1-git-send-email-kirill.shutemov@linux.intel.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/15/2014 05:48 PM, Kirill A. Shutemov wrote: > Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have > the same root cause. Let's look to them one by one. > > The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". > It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). > From my testing I see that page_mapcount() is higher than mapcount here. > > I think it happens due to race between zap_huge_pmd() and > page_check_address_pmd(). page_check_address_pmd() misses PMD > which is under zap: > > CPU0 CPU1 > zap_huge_pmd() > pmdp_get_and_clear() > __split_huge_page() > anon_vma_interval_tree_foreach() > __split_huge_page_splitting() > page_check_address_pmd() > mm_find_pmd() > /* > * We check if PMD present without taking ptl: no > * serialization against zap_huge_pmd(). We miss this PMD, > * it's not accounted to 'mapcount' in __split_huge_page(). > */ > pmd_present(pmd) == 0 > > BUG_ON(mapcount != page_mapcount(page)) // CRASH!!! > > page_remove_rmap(page) > atomic_add_negative(-1, &page->_mapcount) > > The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!". > It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd(). > > This happens in similar way: > > CPU0 CPU1 > zap_huge_pmd() > pmdp_get_and_clear() > page_remove_rmap(page) > atomic_add_negative(-1, &page->_mapcount) > __split_huge_page() > anon_vma_interval_tree_foreach() > __split_huge_page_splitting() > page_check_address_pmd() > mm_find_pmd() > pmd_present(pmd) == 0 /* The same comment as above */ > /* > * No crash this time since we already decremented page->_mapcount in > * zap_huge_pmd(). > */ > BUG_ON(mapcount != page_mapcount(page)) > > /* > * We split the compound page here into small pages without > * serialization against zap_huge_pmd() > */ > __split_huge_page_refcount() > VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!! > > So my understanding the problem is pmd_present() check in mm_find_pmd() > without taking page table lock. > > The bug was introduced by me commit with commit 117b0791ac42. Sorry for > that. :( > > Let's open code mm_find_pmd() in page_check_address_pmd() and do the > check under page table lock. > > Note that __page_check_address() does the same for PTE entires > if sync != 0. > > I've stress tested split and zap code paths for 36+ hours by now and > don't see crashes with the patch applied. Before it took <20 min to > trigger the first bug and few hours for second one (if we ignore > first). > > [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com> > [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com> > > Signed-off-by: Kirill A. Shutemov > Reported-by: Sasha Levin > Cc: #3.13+ Seems to work for me, thanks! Thanks, Sasha