All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Rafael Aquini <aquini@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, mhocko@suse.com, vbabka@suse.cz,
	kirill.shutemov@linux.intel.com
Subject: Re: [PATCH] mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
Date: Sun, 16 Feb 2020 23:32:33 +0000	[thread overview]
Message-ID: <20200216233232.GZ3420@suse.de> (raw)
In-Reply-To: <20200216191800.22423-1-aquini@redhat.com>

On Sun, Feb 16, 2020 at 02:18:00PM -0500, Rafael Aquini wrote:
> From: Mel Gorman <mgorman@techsingularity.net>
>   A user reported a bug against a distribution kernel while running
>   a proprietary workload described as "memory intensive that is not
>   swapping" that is expected to apply to mainline kernels. The workload
>   is read/write/modifying ranges of memory and checking the contents. They
>   reported that within a few hours that a bad PMD would be reported followed
>   by a memory corruption where expected data was all zeros.  A partial report
>   of the bad PMD looked like
> 
>   [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2)
>   [ 5195.341184] ------------[ cut here ]------------
>   [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
>   ....
>   [ 5195.410033] Call Trace:
>   [ 5195.410471]  [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930
>   [ 5195.410716]  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
>   [ 5195.410918]  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
>   [ 5195.411200]  [<ffffffff81098322>] task_work_run+0x72/0x90
>   [ 5195.411246]  [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
>   [ 5195.411494]  [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
>   [ 5195.411739]  [<ffffffff815e56af>] retint_user+0x8/0x10
> 
>   Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
>   was a false detection. The bug does not trigger if automatic NUMA balancing
>   or transparent huge pages is disabled.
> 
>   The bug is due a race in change_pmd_range between a pmd_trans_huge and
>   pmd_nond_or_clear_bad check without any locks held. During the pmd_trans_huge
>   check, a parallel protection update under lock can have cleared the PMD
>   and filled it with a prot_numa entry between the transhuge check and the
>   pmd_none_or_clear_bad check.
> 
>   While this could be fixed with heavy locking, it's only necessary to
>   make a copy of the PMD on the stack during change_pmd_range and avoid
>   races. A new helper is created for this as the check if quite subtle and the
>   existing similar helpful is not suitable. This passed 154 hours of testing
>   (usually triggers between 20 minutes and 24 hours) without detecting bad
>   PMDs or corruption. A basic test of an autonuma-intensive workload showed
>   no significant change in behaviour.
> 
> Although Mel withdrew the patch on the face of LKML comment https://lkml.org/lkml/2017/4/10/922
> the race window aforementioned is still open, and we have reports of Linpack test reporting bad
> residuals after the bad PMD warning is observed. In addition to that, bad rss-counter and
> non-zero pgtables assertions are triggered on mm teardown for the task hitting the bad PMD.
> 
>  host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7)
>  ....
>  host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512
>  host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096
> 
> The issue is observed on a v4.18-based distribution kernel, but the race window is
> expected to be applicable to mainline kernels, as well.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Rafael Aquini <aquini@redhat.com>

It's curious that it took so long for this to be caught again.
Unfortunately I cannot find exactly what it's racing against but maybe
it's not worth chasing down and the patch is simply the safer option :(

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2020-02-16 23:32 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-16 19:18 [PATCH] mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa Rafael Aquini
2020-02-16 23:32 ` Mel Gorman [this message]
2020-03-07  2:40 ` Qian Cai
2020-03-07  3:05   ` Rafael Aquini
2020-03-08  3:20     ` Qian Cai
2020-03-08 23:14       ` Rafael Aquini
2020-03-09  3:27         ` Qian Cai
2020-03-09 15:05           ` Rafael Aquini
2020-03-11  0:04             ` Qian Cai
  -- strict thread matches above, loose matches on Subject: below --
2017-04-10  9:48 [PATCH] mm, numa: Fix " Mel Gorman
2017-04-10  9:48 ` Mel Gorman
2017-04-10 10:03 ` Vlastimil Babka
2017-04-10 10:03   ` Vlastimil Babka
2017-04-10 12:19   ` Mel Gorman
2017-04-10 12:19     ` Mel Gorman
2017-04-10 12:38 ` Rik van Riel
2017-04-10 12:38   ` Rik van Riel
2017-04-10 13:53 ` Michal Hocko
2017-04-10 13:53   ` Michal Hocko
2017-04-10 17:38   ` Mel Gorman
2017-04-10 17:38     ` Mel Gorman
2017-04-10 16:45 ` Zi Yan
2017-04-10 17:20   ` Mel Gorman
2017-04-10 17:20     ` Mel Gorman
2017-04-10 17:49     ` Zi Yan
2017-04-10 18:07       ` Mel Gorman
2017-04-10 18:07         ` Mel Gorman
2017-04-10 22:09         ` Andrew Morton
2017-04-10 22:09           ` Andrew Morton
2017-04-10 22:28           ` Zi Yan
2017-04-11  6:35             ` Vlastimil Babka
2017-04-11  6:35               ` Vlastimil Babka
2017-04-11 21:44               ` Andrew Morton
2017-04-11 21:44                 ` Andrew Morton
2017-04-11  8:29           ` Mel Gorman
2017-04-11  8:29             ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200216233232.GZ3420@suse.de \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.