* FAILED: patch "[PATCH] mm/memory.c: fix a huge pud insertion race during faulting" failed to apply to 4.19-stable tree
@ 2019-12-16 12:00 gregkh
2019-12-16 15:57 ` Sasha Levin
0 siblings, 1 reply; 3+ messages in thread
From: gregkh @ 2019-12-16 12:00 UTC (permalink / raw)
To: thellstrom, akpm, arnd, kirill.shutemov, stable, torvalds, willy; +Cc: stable
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001
From: Thomas Hellstrom <thellstrom@vmware.com>
Date: Sat, 30 Nov 2019 17:51:32 -0800
Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault(). That will lead to pmd_alloc() returning an
invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to
pmd_trans_unstable() and check whether the pud is really stable before
using the pmd pointer.
Race:
Thread 1: Thread 2: Comment
create_huge_pud() Fallback - not taken.
create_huge_pud() Taken.
pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real
experienced problem. It looks to me like the only implementation that
currently creates huge pud pagetable entries is dev_dax_huge_fault()
which doesn't appear to care much about private (COW) mappings or
write-tracking which is, I believe, a prerequisite for create_huge_pud()
falling back on thread 1, but not in thread 2.
Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 3127f9028f54..798ea36a0549 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -938,6 +938,31 @@ static inline int pud_trans_huge(pud_t pud)
}
#endif
+/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */
+static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud)
+{
+ pud_t pudval = READ_ONCE(*pud);
+
+ if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval))
+ return 1;
+ if (unlikely(pud_bad(pudval))) {
+ pud_clear_bad(pud);
+ return 1;
+ }
+ return 0;
+}
+
+/* See pmd_trans_unstable for discussion. */
+static inline int pud_trans_unstable(pud_t *pud)
+{
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
+ defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+ return pud_none_or_trans_huge_or_dev_or_clear_bad(pud);
+#else
+ return 0;
+#endif
+}
+
#ifndef pmd_read_atomic
static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
{
diff --git a/mm/memory.c b/mm/memory.c
index 62b5cce653f6..c3902201989f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4010,6 +4010,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
vmf.pud = pud_alloc(mm, p4d, address);
if (!vmf.pud)
return VM_FAULT_OOM;
+retry_pud:
if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
ret = create_huge_pud(&vmf);
if (!(ret & VM_FAULT_FALLBACK))
@@ -4036,6 +4037,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
vmf.pmd = pmd_alloc(mm, vmf.pud, address);
if (!vmf.pmd)
return VM_FAULT_OOM;
+
+ /* Huge pud page fault raced with pmd_alloc? */
+ if (pud_trans_unstable(vmf.pud))
+ goto retry_pud;
+
if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
ret = create_huge_pmd(&vmf);
if (!(ret & VM_FAULT_FALLBACK))
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: FAILED: patch "[PATCH] mm/memory.c: fix a huge pud insertion race during faulting" failed to apply to 4.19-stable tree
2019-12-16 12:00 FAILED: patch "[PATCH] mm/memory.c: fix a huge pud insertion race during faulting" failed to apply to 4.19-stable tree gregkh
@ 2019-12-16 15:57 ` Sasha Levin
2019-12-16 16:15 ` Thomas Hellstrom
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2019-12-16 15:57 UTC (permalink / raw)
To: gregkh, mhocko
Cc: thellstrom, akpm, arnd, kirill.shutemov, stable, torvalds, willy
On Mon, Dec 16, 2019 at 01:00:44PM +0100, gregkh@linuxfoundation.org wrote:
>
>The patch below does not apply to the 4.19-stable tree.
>If someone wants it applied there, or to any other stable or longterm
>tree, then please email the backport, including the original git commit
>id to <stable@vger.kernel.org>.
>
>thanks,
>
>greg k-h
>
>------------------ original commit in Linus's tree ------------------
>
>From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001
>From: Thomas Hellstrom <thellstrom@vmware.com>
>Date: Sat, 30 Nov 2019 17:51:32 -0800
>Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
>
>A huge pud page can theoretically be faulted in racing with pmd_alloc()
>in __handle_mm_fault(). That will lead to pmd_alloc() returning an
>invalid pmd pointer.
>
>Fix this by adding a pud_trans_unstable() function similar to
>pmd_trans_unstable() and check whether the pud is really stable before
>using the pmd pointer.
>
>Race:
> Thread 1: Thread 2: Comment
> create_huge_pud() Fallback - not taken.
> create_huge_pud() Taken.
> pmd_alloc() Returns an invalid pointer.
>
>This will result in user-visible huge page data corruption.
>
>Note that this was caught during a code audit rather than a real
>experienced problem. It looks to me like the only implementation that
>currently creates huge pud pagetable entries is dev_dax_huge_fault()
>which doesn't appear to care much about private (COW) mappings or
>write-tracking which is, I believe, a prerequisite for create_huge_pud()
>falling back on thread 1, but not in thread 2.
>
>Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
>Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
>Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>Cc: Arnd Bergmann <arnd@arndb.de>
>Cc: Matthew Wilcox <willy@infradead.org>
>Cc: <stable@vger.kernel.org>
>Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This one doesn't apply cleanly because 7635d9cbe832 ("mm, thp, proc:
report THP eligibility for each vma") has changed what
transparent_hugepage_enabled() does.
The "right" backport here would be to simply change from calling
__transparent_hugepage_enabled() to calling
transparent_hugepage_enabled() as we don't have 7635d9cbe832 in older
kernels, but I worry that if we do end up backporting some part of that
logic change later it will diverge us from upstream and will cause for
subtle issues that are difficult to debug.
So unless Michal / Andrew yell at me for this, I'm going to take in
7635d9cbe832 even though it's clearly a new feature just to make
625110b5e9da and future patches apply cleanly, and avoid future issues.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: FAILED: patch "[PATCH] mm/memory.c: fix a huge pud insertion race during faulting" failed to apply to 4.19-stable tree
2019-12-16 15:57 ` Sasha Levin
@ 2019-12-16 16:15 ` Thomas Hellstrom
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Hellstrom @ 2019-12-16 16:15 UTC (permalink / raw)
To: Sasha Levin, gregkh@linuxfoundation.org, mhocko@suse.com
Cc: akpm@linux-foundation.org, arnd@arndb.de,
kirill.shutemov@linux.intel.com, stable@vger.kernel.org,
torvalds@linux-foundation.org, willy@infradead.org
Hi!
On 12/16/19 4:57 PM, Sasha Levin wrote:
> On Mon, Dec 16, 2019 at 01:00:44PM +0100, gregkh@linuxfoundation.org wrote:
>> The patch below does not apply to the 4.19-stable tree.
>> If someone wants it applied there, or to any other stable or longterm
>> tree, then please email the backport, including the original git commit
>> id to <stable@vger.kernel.org>.
>>
>> thanks,
>>
>> greg k-h
>>
>> ------------------ original commit in Linus's tree ------------------
>>
> >From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>> Date: Sat, 30 Nov 2019 17:51:32 -0800
>> Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
>>
>> A huge pud page can theoretically be faulted in racing with pmd_alloc()
>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
>> invalid pmd pointer.
>>
>> Fix this by adding a pud_trans_unstable() function similar to
>> pmd_trans_unstable() and check whether the pud is really stable before
>> using the pmd pointer.
>>
>> Race:
>> Thread 1: Thread 2: Comment
>> create_huge_pud() Fallback - not taken.
>> create_huge_pud() Taken.
>> pmd_alloc() Returns an invalid pointer.
>>
>> This will result in user-visible huge page data corruption.
>>
>> Note that this was caught during a code audit rather than a real
>> experienced problem. It looks to me like the only implementation that
>> currently creates huge pud pagetable entries is dev_dax_huge_fault()
>> which doesn't appear to care much about private (COW) mappings or
>> write-tracking which is, I believe, a prerequisite for create_huge_pud()
>> falling back on thread 1, but not in thread 2.
>>
>> Link: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.kernel.org%2Fr%2F20191115115808.21181-2-thomas_os%40shipmail.org&data=02%7C01%7Cthellstrom%40vmware.com%7C24addc40cb56441b594408d78240a278%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637121086431698566&sdata=wAhgL%2BfbBiu2eDOb3ygPahH0OiYBLV1unSCZ0VxpAQY%3D&reserved=0
>> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> This one doesn't apply cleanly because 7635d9cbe832 ("mm, thp, proc:
> report THP eligibility for each vma") has changed what
> transparent_hugepage_enabled() does.
>
> The "right" backport here would be to simply change from calling
> __transparent_hugepage_enabled() to calling
> transparent_hugepage_enabled() as we don't have 7635d9cbe832 in older
> kernels, but I worry that if we do end up backporting some part of that
> logic change later it will diverge us from upstream and will cause for
> subtle issues that are difficult to debug.
>
> So unless Michal / Andrew yell at me for this, I'm going to take in
> 7635d9cbe832 even though it's clearly a new feature just to make
> 625110b5e9da and future patches apply cleanly, and avoid future issues.
>
Isn't this a change just in the patch context?
In any case, please see previous mails regarding additional testing of
this patch.
Thanks,
Thomas
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-12-16 16:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-16 12:00 FAILED: patch "[PATCH] mm/memory.c: fix a huge pud insertion race during faulting" failed to apply to 4.19-stable tree gregkh
2019-12-16 15:57 ` Sasha Levin
2019-12-16 16:15 ` Thomas Hellstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox