* FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree
@ 2024-10-18 7:57 gregkh
2024-10-22 9:07 ` [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: gregkh @ 2024-10-18 7:57 UTC (permalink / raw)
To: david, akpm, bfu, borntraeger, frankja, hughd, imbrenda,
ryan.roberts, stable, thuth, wangkefeng.wang, willy
Cc: stable
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 2b0f922323ccfa76219bcaacd35cd50aeaa13592
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2024101842-empty-espresso-c8a3@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0f922323ccfa76219bcaacd35cd50aeaa13592 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Fri, 11 Oct 2024 12:24:45 +0200
Subject: [PATCH] mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index c0869a962ddd..30feedabc932 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ /*
+ * It is too late to allocate a small folio, we already have a large
+ * folio in the pagecache: especially s390 KVM cannot tolerate any
+ * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
+ * PMD mappings if THPs are disabled.
+ */
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
+ return ret;
+
if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
return ret;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
2024-10-18 7:57 FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree gregkh
@ 2024-10-22 9:07 ` David Hildenbrand
2024-11-05 12:56 ` Petr Vaněk
2024-10-22 9:09 ` [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-10-22 9:07 UTC (permalink / raw)
To: stable
Cc: Kefeng Wang, David Hildenbrand, Leo Fu, Thomas Huth, Ryan Roberts,
Christian Borntraeger, Claudio Imbrenda, Hugh Dickins,
Janosch Frank, Matthew Wilcox
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Patch series "mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma".
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the
problematic bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing the
problematic page, because the readahead logic might decide to use a
PMD-sized THP and faulting it into the page tables will install a PMD
mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can
install a PMD mapping. khugepaged should already be taking care of not
collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Boqiao Fu <bfu@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96)
Signed-off-by: David Hildenbrand <david@redhat.com>
---
The change in mm/shmem.c does not exist yet.
This patch is required to backport the fix
2b0f922323ccfa76219bcaacd35cd50aeaa13592, for which a backport will
be sent separately in reply to the "FAILED: ..." mail.
---
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c | 15 ++-------------
2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fa0350b0812a..fc789c0ac85b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -137,6 +137,24 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma,
+ unsigned long vm_flags)
+{
+ /*
+ * Explicitly disabled through madvise or prctl, or some
+ * architectures may disable THP for some mappings, for
+ * example, s390 kvm.
+ */
+ return (vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags);
+}
+
+static inline bool thp_disabled_by_hw(void)
+{
+ /* If the hardware/firmware marked hugepage support disabled. */
+ return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED);
+}
+
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9aea11b1477c..dfd6577225d8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -78,19 +78,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
if (!vma->vm_mm) /* vdso */
return false;
- /*
- * Explicitly disabled through madvise or prctl, or some
- * architectures may disable THP for some mappings, for
- * example, s390 kvm.
- * */
- if ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- return false;
- /*
- * If the hardware/firmware marked hugepage support disabled.
- */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
- return false;
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
+ return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
if (vma_is_dax(vma))
--
2.46.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
2024-10-18 7:57 FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree gregkh
2024-10-22 9:07 ` [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
@ 2024-10-22 9:09 ` David Hildenbrand
2024-11-04 15:42 ` David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-10-22 9:09 UTC (permalink / raw)
To: stable
Cc: David Hildenbrand, Leo Fu, Thomas Huth, Matthew Wilcox (Oracle),
Ryan Roberts, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Hugh Dickins, Kefeng Wang
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2b0f922323ccfa76219bcaacd35cd50aeaa13592)
Signed-off-by: David Hildenbrand <david@redhat.com>
---
Minor contextual difference.
Note that the backport of 963756aac1f011d904ddd9548ae82286d3a91f96 is
required (send separately as reply to the "FAILED:" mail).
---
mm/memory.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index b6ddfe22c5d5..742c2f65c2c8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4293,6 +4293,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ /*
+ * It is too late to allocate a small folio, we already have a large
+ * folio in the pagecache: especially s390 KVM cannot tolerate any
+ * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
+ * PMD mappings if THPs are disabled.
+ */
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
+ return ret;
+
if (!transhuge_vma_suitable(vma, haddr))
return ret;
--
2.46.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
2024-10-22 9:09 ` [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
@ 2024-11-04 15:42 ` David Hildenbrand
0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-11-04 15:42 UTC (permalink / raw)
To: stable
Cc: Leo Fu, Thomas Huth, Matthew Wilcox (Oracle), Ryan Roberts,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Hugh Dickins, Kefeng Wang, Greg KH
Gentle ping, XEN PV users reported an issue fixed by this fix upstream.
On 22.10.24 11:09, David Hildenbrand wrote:
> We (or rather, readahead logic :) ) might be allocating a THP in the
> pagecache and then try mapping it into a process that explicitly disabled
> THP: we might end up installing PMD mappings.
>
> This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
> THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
> starting the VM.
>
> For example, starting a VM backed on a file system with large folios
> supported makes the VM crash when the VM tries accessing such a mapping
> using KVM.
>
> Is it also a problem when the HW disabled THP using
> TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
> without X86_FEATURE_PSE.
>
> In the future, we might be able to do better on s390x and only disallow
> PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
> really wants. For now, fix it by essentially performing the same check as
> would be done in __thp_vma_allowable_orders() or in shmem code, where this
> works as expected, and disallow PMD mappings, making us fallback to PTE
> mappings.
>
> Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Reported-by: Leo Fu <bfu@redhat.com>
> Tested-by: Thomas Huth <thuth@redhat.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> (cherry picked from commit 2b0f922323ccfa76219bcaacd35cd50aeaa13592)
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>
> Minor contextual difference.
>
> Note that the backport of 963756aac1f011d904ddd9548ae82286d3a91f96 is
> required (send separately as reply to the "FAILED:" mail).
>
> ---
> mm/memory.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index b6ddfe22c5d5..742c2f65c2c8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4293,6 +4293,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> pmd_t entry;
> vm_fault_t ret = VM_FAULT_FALLBACK;
>
> + /*
> + * It is too late to allocate a small folio, we already have a large
> + * folio in the pagecache: especially s390 KVM cannot tolerate any
> + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
> + * PMD mappings if THPs are disabled.
> + */
> + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
> + return ret;
> +
> if (!transhuge_vma_suitable(vma, haddr))
> return ret;
>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
2024-10-22 9:07 ` [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
@ 2024-11-05 12:56 ` Petr Vaněk
2024-11-05 16:30 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Petr Vaněk @ 2024-11-05 12:56 UTC (permalink / raw)
To: David Hildenbrand
Cc: stable, Kefeng Wang, Leo Fu, Thomas Huth, Ryan Roberts,
Christian Borntraeger, Claudio Imbrenda, Hugh Dickins,
Janosch Frank, Matthew Wilcox
Hi David,
On Tue, Oct 22, 2024 at 11:07:55AM +0200, David Hildenbrand wrote:
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9aea11b1477c..dfd6577225d8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -78,19 +78,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
> if (!vma->vm_mm) /* vdso */
> return false;
>
> - /*
> - * Explicitly disabled through madvise or prctl, or some
> - * architectures may disable THP for some mappings, for
> - * example, s390 kvm.
> - * */
> - if ((vm_flags & VM_NOHUGEPAGE) ||
> - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> - return false;
> - /*
> - * If the hardware/firmware marked hugepage support disabled.
> - */
> - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
> - return false;
> + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
> + return 0;
Shouldn't this return false for consistency with the rest of the
function?
> /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> if (vma_is_dax(vma))
> --
> 2.46.1
>
Petr
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
2024-11-05 12:56 ` Petr Vaněk
@ 2024-11-05 16:30 ` David Hildenbrand
2024-11-05 16:38 ` Greg KH
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-11-05 16:30 UTC (permalink / raw)
To: Petr Vaněk
Cc: stable, Kefeng Wang, Leo Fu, Thomas Huth, Ryan Roberts,
Christian Borntraeger, Claudio Imbrenda, Hugh Dickins,
Janosch Frank, Matthew Wilcox
On 05.11.24 13:56, Petr Vaněk wrote:
> Hi David,
>
Hi Petr,
> On Tue, Oct 22, 2024 at 11:07:55AM +0200, David Hildenbrand wrote:
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 9aea11b1477c..dfd6577225d8 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -78,19 +78,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>> if (!vma->vm_mm) /* vdso */
>> return false;
>>
>> - /*
>> - * Explicitly disabled through madvise or prctl, or some
>> - * architectures may disable THP for some mappings, for
>> - * example, s390 kvm.
>> - * */
>> - if ((vm_flags & VM_NOHUGEPAGE) ||
>> - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> - return false;
>> - /*
>> - * If the hardware/firmware marked hugepage support disabled.
>> - */
>> - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
>> - return false;
>> + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
>> + return 0;
>
> Shouldn't this return false for consistency with the rest of the
> function?
Yes, that's better. Same applies to the 6.1.y backport of this.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
2024-11-05 16:30 ` David Hildenbrand
@ 2024-11-05 16:38 ` Greg KH
0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-11-05 16:38 UTC (permalink / raw)
To: David Hildenbrand
Cc: Petr Vaněk, stable, Kefeng Wang, Leo Fu, Thomas Huth,
Ryan Roberts, Christian Borntraeger, Claudio Imbrenda,
Hugh Dickins, Janosch Frank, Matthew Wilcox
On Tue, Nov 05, 2024 at 05:30:14PM +0100, David Hildenbrand wrote:
> On 05.11.24 13:56, Petr Vaněk wrote:
> > Hi David,
> >
>
> Hi Petr,
>
> > On Tue, Oct 22, 2024 at 11:07:55AM +0200, David Hildenbrand wrote:
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 9aea11b1477c..dfd6577225d8 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -78,19 +78,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
> > > if (!vma->vm_mm) /* vdso */
> > > return false;
> > > - /*
> > > - * Explicitly disabled through madvise or prctl, or some
> > > - * architectures may disable THP for some mappings, for
> > > - * example, s390 kvm.
> > > - * */
> > > - if ((vm_flags & VM_NOHUGEPAGE) ||
> > > - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > - return false;
> > > - /*
> > > - * If the hardware/firmware marked hugepage support disabled.
> > > - */
> > > - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
> > > - return false;
> > > + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
> > > + return 0;
> >
> > Shouldn't this return false for consistency with the rest of the
> > function?
>
> Yes, that's better. Same applies to the 6.1.y backport of this.
Ok, dropping this from the review queue, please resend the updated
versions.
thansk,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 6.6.y 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
2024-10-18 7:57 FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree gregkh
2024-10-22 9:07 ` [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
2024-10-22 9:09 ` [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
@ 2024-11-05 17:25 ` David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 1/2] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
` (2 more replies)
2 siblings, 3 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-11-05 17:25 UTC (permalink / raw)
To: stable; +Cc: Petr Vaněk, Greg KH, David Hildenbrand
Resending both patches in one series now, easier for everybody that way.
Conflicts:
* "mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()"
-> The change in mm/shmem.c does not exist yet.
-> Small contextual conflict.
* "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma"
-> Small contextual conflict.
v1 -> v2:
* "mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()"
-> Keep using "return false;" instead of "return 0;"
David Hildenbrand (1):
mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma
Kefeng Wang (1):
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c | 13 +------------
mm/memory.c | 9 +++++++++
3 files changed, 28 insertions(+), 12 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 6.6.y 1/2] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
@ 2024-11-05 17:25 ` David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 2/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
2024-11-06 7:23 ` [PATCH 6.6.y 0/2] " Greg KH
2 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-11-05 17:25 UTC (permalink / raw)
To: stable
Cc: Petr Vaněk, Greg KH, David Hildenbrand, Kefeng Wang, Leo Fu,
Thomas Huth, Ryan Roberts, Christian Borntraeger,
Claudio Imbrenda, Hugh Dickins, Janosch Frank, Matthew Wilcox
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Patch series "mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma".
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the
problematic bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing the
problematic page, because the readahead logic might decide to use a
PMD-sized THP and faulting it into the page tables will install a PMD
mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can
install a PMD mapping. khugepaged should already be taking care of not
collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Boqiao Fu <bfu@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96)
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c | 13 +------------
2 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fa0350b0812ab..fc789c0ac85b8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -137,6 +137,24 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma,
+ unsigned long vm_flags)
+{
+ /*
+ * Explicitly disabled through madvise or prctl, or some
+ * architectures may disable THP for some mappings, for
+ * example, s390 kvm.
+ */
+ return (vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags);
+}
+
+static inline bool thp_disabled_by_hw(void)
+{
+ /* If the hardware/firmware marked hugepage support disabled. */
+ return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED);
+}
+
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9aea11b1477c8..7b4cb5c68b61b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -78,18 +78,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
if (!vma->vm_mm) /* vdso */
return false;
- /*
- * Explicitly disabled through madvise or prctl, or some
- * architectures may disable THP for some mappings, for
- * example, s390 kvm.
- * */
- if ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- return false;
- /*
- * If the hardware/firmware marked hugepage support disabled.
- */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
return false;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6.6.y 2/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 1/2] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
@ 2024-11-05 17:25 ` David Hildenbrand
2024-11-06 7:23 ` [PATCH 6.6.y 0/2] " Greg KH
2 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-11-05 17:25 UTC (permalink / raw)
To: stable
Cc: Petr Vaněk, Greg KH, David Hildenbrand, Leo Fu, Thomas Huth,
Matthew Wilcox (Oracle), Ryan Roberts, Christian Borntraeger,
Janosch Frank, Claudio Imbrenda, Hugh Dickins, Kefeng Wang
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Leo Fu <bfu@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2b0f922323ccfa76219bcaacd35cd50aeaa13592)
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index b6ddfe22c5d5c..742c2f65c2c85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4293,6 +4293,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ /*
+ * It is too late to allocate a small folio, we already have a large
+ * folio in the pagecache: especially s390 KVM cannot tolerate any
+ * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
+ * PMD mappings if THPs are disabled.
+ */
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
+ return ret;
+
if (!transhuge_vma_suitable(vma, haddr))
return ret;
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6.6.y 0/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 1/2] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 2/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
@ 2024-11-06 7:23 ` Greg KH
2 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-11-06 7:23 UTC (permalink / raw)
To: David Hildenbrand; +Cc: stable, Petr Vaněk
On Tue, Nov 05, 2024 at 06:25:48PM +0100, David Hildenbrand wrote:
> Resending both patches in one series now, easier for everybody that way.
>
Much better, now queued up, thanks!
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-11-06 7:23 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18 7:57 FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree gregkh
2024-10-22 9:07 ` [PATCH 6.6.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
2024-11-05 12:56 ` Petr Vaněk
2024-11-05 16:30 ` David Hildenbrand
2024-11-05 16:38 ` Greg KH
2024-10-22 9:09 ` [PATCH 6.6.y] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
2024-11-04 15:42 ` David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 0/2] " David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 1/2] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() David Hildenbrand
2024-11-05 17:25 ` [PATCH 6.6.y 2/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma David Hildenbrand
2024-11-06 7:23 ` [PATCH 6.6.y 0/2] " Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).