From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751534AbbJLKNY (ORCPT ); Mon, 12 Oct 2015 06:13:24 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:33785 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751073AbbJLKNW (ORCPT ); Mon, 12 Oct 2015 06:13:22 -0400 Date: Mon, 12 Oct 2015 13:13:20 +0300 From: "Kirill A. Shutemov" To: Minchan Kim Cc: Andrew Morton , Mel Gorman , Vlastimil Babka , Andrea Arcangeli , Hugh Dickins , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] thp: use is_zero_pfn after pte_present check Message-ID: <20151012101320.GB2544@node> References: <1444614856-18543-1-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1444614856-18543-1-git-send-email-minchan@kernel.org> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 12, 2015 at 10:54:16AM +0900, Minchan Kim wrote: > Use is_zero_pfn on pteval only after pte_present check on pteval > (It might be better idea to introduce is_zero_pte where checks > pte_present first). Otherwise, it could work with swap or > migration entry and if pte_pfn's result is equal to zero_pfn > by chance, we lose user's data in __collapse_huge_page_copy. > So if you're luck, the application is segfaulted and finally you > could see below message when the application is exit. > > BUG: Bad rss-counter state mm:ffff88007f099300 idx:2 val:3 Did you acctually steped on the bug? If yes it's subject for stable@, I think. > Signed-off-by: Minchan Kim > --- > > I found this bug with MADV_FREE hard test. Sometime, I saw > "Bad rss-counter" message with MM_SWAPENTS but it's really > rare, once a day if I was luck or once in five days if I was > unlucky so I am doing test still and just pass a few days but > I hope it will fix the issue. > > mm/huge_memory.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 4b06b8db9df2..349590aa4533 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2665,15 +2665,25 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, > for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; > _pte++, _address += PAGE_SIZE) { > pte_t pteval = *_pte; > - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > + if (pte_none(pteval)) { In -mm tree we have is_swap_pte() check before this point in khugepaged_scan_pmd() Also, what about similar pattern in __collapse_huge_page_isolate() and __collapse_huge_page_copy()? Shouldn't they be fixed as well? > if (!userfaultfd_armed(vma) && > ++none_or_zero <= khugepaged_max_ptes_none) > continue; > else > goto out_unmap; > } > + > if (!pte_present(pteval)) > goto out_unmap; > + > + if (is_zero_pfn(pte_pfn(pteval))) { > + if (!userfaultfd_armed(vma) && > + ++none_or_zero <= khugepaged_max_ptes_none) > + continue; > + else > + goto out_unmap; > + } > + > if (pte_write(pteval)) > writable = true; > > -- > 1.9.1 > -- Kirill A. Shutemov