* [PATCH v3 1/4] s390/mm: Fix handling of _PAGE_UNUSED pte bit
2026-06-16 16:51 [PATCH v3 0/4] KVM: s390: Fixes for gmap and _PAGE_UNUSED Claudio Imbrenda
@ 2026-06-16 16:51 ` Claudio Imbrenda
2026-06-16 16:51 ` [PATCH v3 2/4] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Claudio Imbrenda @ 2026-06-16 16:51 UTC (permalink / raw)
To: linux-kernel
Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens
The _PAGE_UNUSED softbit should not really be lying around. Its sole
purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
that the page can be discarded instead of being moved / swapped.
KVM has no way to know why a page is being unmapped, so it sets the bit
on userspace ptes corresponding to unused guest pages every time they
get unmapped. KVM has no reasonable way to clear the bit once the page
is in use again.
Without appropriate cleanup, the _PAGE_UNUSED bit will linger around
and cause guest corruption when a used page is instead thrown out.
While set_ptes() checks and clears the bit, other paths that set new
ptes did not. This led to used pages being thrown out as if they were
unused, causing guest corruption.
This patch fixes the issue by clearing the _PAGE_UNUSED bit in
set_pte(), so whenever a present pte is getting set. The check in
set_ptes() is then redundant and can be removed.
Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
pte is present; the _PAGE_UNUSED bit is only defined for present ptes
and thus should not be set for non-present ptes.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
---
arch/s390/include/asm/pgtable.h | 4 ++--
arch/s390/mm/gmap_helpers.c | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index ca376a9b8e41..d03663483f76 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -980,6 +980,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
static inline void set_pte(pte_t *ptep, pte_t pte)
{
+ if (pte_present(pte))
+ pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
WRITE_ONCE(*ptep, pte);
}
@@ -1332,8 +1334,6 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t entry, unsigned int nr)
{
- if (pte_present(entry))
- entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
page_table_check_ptes_set(mm, addr, ptep, entry, nr);
for (;;) {
set_pte(ptep, entry);
diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index 1cfe4724fbe2..60023b6fdcb1 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -181,7 +181,8 @@ void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vmaddr)
if (IS_ERR_OR_NULL(ptep))
return;
- __atomic64_or(_PAGE_UNUSED, (long *)ptep);
+ if (pte_present(*ptep))
+ __atomic64_or(_PAGE_UNUSED, (long *)ptep);
pte_unmap_unlock(ptep, ptl);
}
EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused);
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v3 2/4] KVM: s390: Fix dat_peek_cmma() overflow
2026-06-16 16:51 [PATCH v3 0/4] KVM: s390: Fixes for gmap and _PAGE_UNUSED Claudio Imbrenda
2026-06-16 16:51 ` [PATCH v3 1/4] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
@ 2026-06-16 16:51 ` Claudio Imbrenda
2026-06-16 17:03 ` sashiko-bot
2026-06-16 16:51 ` [PATCH v3 3/4] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-16 16:51 ` [PATCH v3 4/4] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
3 siblings, 1 reply; 7+ messages in thread
From: Claudio Imbrenda @ 2026-06-16 16:51 UTC (permalink / raw)
To: linux-kernel
Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens
If userspace passes a start address that is out of bounds,
_dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
touched and will stay 0. This will cause *count to underflow and report
a very high number, and the function will end up erroneously reporting
success.
Fix by only setting *count if the end address is not smaller than the
starting address. This way invalid starting addresses will correctly
return -EFAULT and *count will correctly indicate that no values have
been returned.
Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
arch/s390/kvm/dat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
index 4a41c0247ffa..cffac7782c4b 100644
--- a/arch/s390/kvm/dat.c
+++ b/arch/s390/kvm/dat.c
@@ -1209,7 +1209,7 @@ int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values)
int rc;
rc = _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DEFAULT, &state);
- *count = state.end - start;
+ *count = state.end >= start ? state.end - start : 0;
/* Return success if at least one value was saved, otherwise an error. */
return (rc == -EFAULT && *count > 0) ? 0 : rc;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v3 2/4] KVM: s390: Fix dat_peek_cmma() overflow
2026-06-16 16:51 ` [PATCH v3 2/4] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
@ 2026-06-16 17:03 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-16 17:03 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: kvm, linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] kvm_s390_get_cmma_bits() leaks uninitialized kernel heap memory to userspace because the vmalloc buffer is partially populated and skipped indices remain uninitialized.
--
commit f6af8b3c3817150f3fd3255f80c30b252be59298
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
KVM: s390: Fix dat_peek_cmma() overflow
If userspace passes a start address that is out of bounds,
_dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
touched and will stay 0. This will cause *count to underflow and report
a very high number, and the function will end up erroneously reporting
success.
> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index 4a41c0247ffa2..cffac7782c4bf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1209,7 +1209,7 @@ int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values)
> int rc;
>
> rc = _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DEFAULT, &state);
> - *count = state.end - start;
> + *count = state.end >= start ? state.end - start : 0;
[Severity: Critical]
This is a pre-existing issue, but I noticed a potential information leak to
userspace related to how this count is populated.
In arch/s390/kvm/kvm-s390.c:kvm_s390_get_cmma_bits(), the buffer is
allocated with vmalloc() instead of vzalloc():
values = vmalloc(args->count);
During the walk, __dat_peek_cmma_crste() can skip unmapped regions by
advancing state->end without writing to the values array:
if (crstep->h.i)
state->end = min(walk->end, next);
Similarly, __dat_get_cmma_pte() can skip clean pages within the max bit
distance without writing anything:
if (!READ_ONCE(*pgste_of(ptep)).cmma_d)
return 0;
Since this *count value directly determines how many bytes copy_to_user()
copies in kvm_s390_get_cmma_bits(), can this leak uninitialized kernel
memory to userspace if the skipped entries leave gaps in the array?
> /* Return success if at least one value was saved, otherwise an error. */
> return (rc == -EFAULT && *count > 0) ? 0 : rc;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616165110.360921-1-imbrenda@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 3/4] KVM: s390: Do not set special large pages dirty
2026-06-16 16:51 [PATCH v3 0/4] KVM: s390: Fixes for gmap and _PAGE_UNUSED Claudio Imbrenda
2026-06-16 16:51 ` [PATCH v3 1/4] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-16 16:51 ` [PATCH v3 2/4] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
@ 2026-06-16 16:51 ` Claudio Imbrenda
2026-06-16 17:10 ` sashiko-bot
2026-06-16 16:51 ` [PATCH v3 4/4] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
3 siblings, 1 reply; 7+ messages in thread
From: Claudio Imbrenda @ 2026-06-16 16:51 UTC (permalink / raw)
To: linux-kernel
Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens
Special pages / folios should not be set dirty. This also applies to
large pages.
Add a missing check in gmap_clear_young_crste() to prevent setting the
large page dirty if it is a large page.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
---
arch/s390/kvm/gmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index 52d55ddea8d4..3192f610f696 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -327,7 +327,7 @@ static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t end, st
new.h.i = 1;
new.s.fc1.y = 0;
new.s.fc1.prefix_notif = 0;
- if (new.s.fc1.d || !new.h.p)
+ if ((new.s.fc1.d || !new.h.p) && !new.s.fc1.s)
folio_set_dirty(phys_to_folio(crste_origin_large(crste)));
new.s.fc1.d = 0;
new.h.p = 1;
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v3 3/4] KVM: s390: Do not set special large pages dirty
2026-06-16 16:51 ` [PATCH v3 3/4] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
@ 2026-06-16 17:10 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-16 17:10 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: Vasily Gorbik, Alexander Gordeev, kvm, Heiko Carstens, linux-s390
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [Low] Commit message and code mismatch regarding 'large' vs 'special' page.
Pre-existing issues:
- [Critical] Missing checks for 'special' page table entries before resolving to folios, leading to host kernel crashes during unmap or VM teardown.
--
commit 326f6a92844d8bfeb743555b0b3e68a96053b8ea
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
KVM: s390: Do not set special large pages dirty
This commit prevents setting special pages or folios dirty, adding a check
specifically for large pages in gmap_clear_young_crste().
> Add a missing check in gmap_clear_young_crste() to prevent setting the
> large page dirty if it is a large page.
[Severity: Low]
This isn't a bug, but there is a typo in the commit message. The code adds
a check for the special bit (!new.s.fc1.s), but the message says "if it is
a large page" instead of "if it is a special page". Should the commit
message be updated to match the code?
> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> index 52d55ddea8d4f..3192f610f6962 100644
> --- a/arch/s390/kvm/gmap.c
> +++ b/arch/s390/kvm/gmap.c
> @@ -327,7 +327,7 @@ static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t end, st
> new.h.i = 1;
> new.s.fc1.y = 0;
> new.s.fc1.prefix_notif = 0;
> - if (new.s.fc1.d || !new.h.p)
> + if ((new.s.fc1.d || !new.h.p) && !new.s.fc1.s)
> folio_set_dirty(phys_to_folio(crste_origin_large(crste)));
[Severity: Critical]
This is a pre-existing issue, but since this patch fixes a missing special
bit check here, are there other places missing this exact same check that
could lead to a host kernel crash?
For example, in arch/s390/kvm/gmap.c, _gmap_unmap_crste() calls
phys_to_folio() on a large page without checking if it is a special page:
if (old.s.fc1.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags))
folio = phys_to_folio(crste_origin_large(old));
Similarly, _gmap_unmap_pte() calls pfn_folio() on a standard PTE without
checking the special bit (ptep->s.s):
if (ptep->s.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags))
folio = pfn_folio(ptep->h.pfra);
_destroy_pages_crste() and _destroy_pages_pte() also appear to
unconditionally resolve PFNs to page structs without checking the special
bit before passing them to __kvm_s390_pv_destroy_page().
Could a Secure/PV VM mapping an IO memory region directly (like via VFIO)
cause a local denial of service when this memory is unmapped or the VM is
destroyed, because these IO-mapped PFNs lack a valid struct page?
Also, in ptep_get_and_clear() in arch/s390/include/asm/pgtable.h,
uv_convert_from_secure_pte() is called on any present PTE in a protected mm
without verifying !pte_special(res):
if (mm_is_protected(mm) && pte_present(res))
WARN_ON_ONCE(uv_convert_from_secure_pte(res));
This unconditionally calls pfn_folio() on the PFN derived from the host
PTE in arch/s390/kernel/uv.c:uv_convert_from_secure_pte(). Should these
paths also be updated to ensure they check for special page table entries
before resolving them to folios?
> new.s.fc1.d = 0;
> new.h.p = 1;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616165110.360921-1-imbrenda@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 4/4] KVM: s390: Fix code typo in gmap_protect_asce_top_level()
2026-06-16 16:51 [PATCH v3 0/4] KVM: s390: Fixes for gmap and _PAGE_UNUSED Claudio Imbrenda
` (2 preceding siblings ...)
2026-06-16 16:51 ` [PATCH v3 3/4] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
@ 2026-06-16 16:51 ` Claudio Imbrenda
3 siblings, 0 replies; 7+ messages in thread
From: Claudio Imbrenda @ 2026-06-16 16:51 UTC (permalink / raw)
To: linux-kernel
Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens
The correct length to pass to kvm_s390_get_guest_pages() is asce.tl + 1,
not asce.dt + 1. It was a typo, which, due to fortuitous circumstances,
did not cause bugs. It should nonetheless be fixed.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE")
---
arch/s390/kvm/gmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index 3192f610f696..e6e786811db8 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -1262,7 +1262,7 @@ static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, struct gma
/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
smp_rmb();
- rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + 1, false);
+ rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.tl + 1, false);
if (rc > 0)
rc = -EFAULT;
if (!rc)
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread