[PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes
@ 2026-06-22 16:07 Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 1/9] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

Fix handling of _PAGE_UNUSED pte bit. The bit was lingering around
longer than it should have, causing issues. This is yet a different
approach from the previous version.

Fix some cmma dirty tracking functions.

Also fix a few other small issues in gmap.

v4-v5:
* Make sure the counter of cmma-dirty pages does not explode
* Improve comments in cmma selftest
* Rework and fix locking in kvm_s390_set_mem_control()
* Do not return success if an allocation in kvm_s390_set_cmma_bits() fails

v3->v4:
* Improve/fix descriptions of patches 1 and 3
* Fix handle_{sske,pfmf} under memory pressure
* Fix cmma dirty tracking and its selftest

v2->v3:
* Fix _PAGE_UNUSED in yet a different way
* Fix a potential overflow in dat_peek_cmma()
* Some minor gmap fixes that are not actual bugs yet

v1->v2:
* Completely different approach, entirely new patch

Claudio Imbrenda (9):
  s390/mm: Fix handling of _PAGE_UNUSED pte bit
  KVM: s390: Fix dat_peek_cmma() overflow
  KVM: s390: Do not set special large pages dirty
  KVM: s390: Fix code typo in gmap_protect_asce_top_level()
  KVM: s390: Fix handle_{sske,pfmf} under memory pressure
  KVM: s390: Fix locking in kvm_s390_set_mem_control()
  KVM: s390: Fix cmma dirty tracking
  KVM: s390: selftests: Fix cmma selftest
  KVM: s390: Return failure in case of failure in
    kvm_s390_set_cmma_bits()

 arch/s390/include/asm/pgtable.h              |  4 +--
 arch/s390/kvm/dat.c                          |  5 ++-
 arch/s390/kvm/gmap.c                         | 15 ++++++---
 arch/s390/kvm/kvm-s390.c                     | 35 ++++++++------------
 arch/s390/kvm/priv.c                         | 10 ++++--
 arch/s390/mm/gmap_helpers.c                  |  3 +-
 tools/testing/selftests/kvm/s390/cmma_test.c |  6 ++++
 7 files changed, 46 insertions(+), 32 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/9] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

The _PAGE_UNUSED softbit should not really be lying around. Its sole
purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
that the page can be discarded instead of being moved / swapped.

KVM has no way to know why a page is being unmapped, so it sets the bit
on userspace ptes corresponding to unused guest pages every time they
get unmapped. KVM has no reasonable way to clear the bit once the page
is in use again.

While set_ptes() checks and clears the bit, other paths that set new
ptes did not. This led to used pages being thrown out as if they were
unused, causing guest corruption.

Fix the issue by clearing the _PAGE_UNUSED bit for present ptes in
set_pte(), i.e. whenever a present pte is getting set. The check in
set_ptes() is then redundant and can be removed.

Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
pte is present; the _PAGE_UNUSED bit is only defined for present ptes
and thus should not be set for non-present ptes.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 4 ++--
 arch/s390/mm/gmap_helpers.c     | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index ca376a9b8e41..d03663483f76 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -980,6 +980,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
+	if (pte_present(pte))
+		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
 	WRITE_ONCE(*ptep, pte);
 }
 
@@ -1332,8 +1334,6 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t entry, unsigned int nr)
 {
-	if (pte_present(entry))
-		entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
 	page_table_check_ptes_set(mm, addr, ptep, entry, nr);
 	for (;;) {
 		set_pte(ptep, entry);
diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index 1cfe4724fbe2..60023b6fdcb1 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -181,7 +181,8 @@ void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vmaddr)
 	if (IS_ERR_OR_NULL(ptep))
 		return;
 
-	__atomic64_or(_PAGE_UNUSED, (long *)ptep);
+	if (pte_present(*ptep))
+		__atomic64_or(_PAGE_UNUSED, (long *)ptep);
 	pte_unmap_unlock(ptep, ptl);
 }
 EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 1/9] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:18   ` Christian Borntraeger
  2026-06-22 16:23   ` sashiko-bot
  2026-06-22 16:07 ` [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

If userspace passes a start address that is out of bounds,
_dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
touched and will stay 0. This will cause *count to underflow and report
a very high number, and the function will end up erroneously reporting
success.

Fix by only setting *count if the end address is not smaller than the
starting address. This way invalid starting addresses will correctly
return -EFAULT and *count will correctly indicate that no values have
been returned.

Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/dat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
index 4a41c0247ffa..cffac7782c4b 100644
--- a/arch/s390/kvm/dat.c
+++ b/arch/s390/kvm/dat.c
@@ -1209,7 +1209,7 @@ int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values)
 	int rc;
 
 	rc = _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DEFAULT, &state);
-	*count = state.end - start;
+	*count = state.end >= start ? state.end - start : 0;
 	/* Return success if at least one value was saved, otherwise an error. */
 	return (rc == -EFAULT && *count > 0) ? 0 : rc;
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 1/9] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:27   ` sashiko-bot
  2026-06-22 16:07 ` [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

Special pages / folios should not be set dirty. This also applies to
large pages.

Add a missing check in gmap_clear_young_crste() to prevent setting the
large page dirty if it is a special page.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
---
 arch/s390/kvm/gmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index 52d55ddea8d4..3192f610f696 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -327,7 +327,7 @@ static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t end, st
 		new.h.i = 1;
 		new.s.fc1.y = 0;
 		new.s.fc1.prefix_notif = 0;
-		if (new.s.fc1.d || !new.h.p)
+		if ((new.s.fc1.d || !new.h.p) && !new.s.fc1.s)
 			folio_set_dirty(phys_to_folio(crste_origin_large(crste)));
 		new.s.fc1.d = 0;
 		new.h.p = 1;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level()
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (2 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:24   ` sashiko-bot
  2026-06-22 16:07 ` [PATCH v5 5/9] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

The correct length to pass to kvm_s390_get_guest_pages() is asce.tl + 1,
not asce.dt + 1. It was a typo, which, due to fortuitous circumstances,
did not cause bugs. It should nonetheless be fixed.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE")
---
 arch/s390/kvm/gmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index 3192f610f696..e6e786811db8 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -1262,7 +1262,7 @@ static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, struct gma
 	/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
 	smp_rmb();
 
-	rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + 1, false);
+	rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.tl + 1, false);
 	if (rc > 0)
 		rc = -EFAULT;
 	if (!rc)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/9] KVM: s390: Fix handle_{sske,pfmf} under memory pressure
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (3 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

Under heavy memory pressure, handle_sske() and handle_pfmf() might
cause an endless loop if the mmu cache runs empty, the atomic
allocations fail, and the top-up function also fails. While quite
unlikely, that scenario is not impossible.

Fix the issue by not ignoring the return value of
kvm_s390_mmu_cache_topup(), and appropriately returning an error code
in case of failure.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
---
 arch/s390/kvm/priv.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 447ec7ed423d..9bc6fd02ff77 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -366,7 +366,9 @@ static int handle_sske(struct kvm_vcpu *vcpu)
 		if (rc > 1)
 			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 		if (rc == -ENOMEM) {
-			kvm_s390_mmu_cache_topup(vcpu->arch.mc);
+			rc = kvm_s390_mmu_cache_topup(vcpu->arch.mc);
+			if (rc)
+				return rc;
 			continue;
 		}
 		if (rc < 0)
@@ -1122,7 +1124,9 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 			if (rc > 1)
 				return kvm_s390_inject_program_int(vcpu, rc);
 			if (rc == -ENOMEM) {
-				kvm_s390_mmu_cache_topup(vcpu->arch.mc);
+				rc = kvm_s390_mmu_cache_topup(vcpu->arch.mc);
+				if (rc)
+					return rc;
 				continue;
 			}
 			if (rc < 0)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control()
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (4 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 5/9] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:19   ` sashiko-bot
  2026-06-22 16:07 ` [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

All subfunctions in kvm_s390_set_mem_control() need to take the
kvm->lock, so factor it out of the switch.

Also add the missing locking around dat_reset_cmma().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ffb20a64d328..93141a68e0dd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -962,6 +962,8 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 {
 	int ret;
 
+	guard(mutex)(&kvm->lock);
+
 	switch (attr->attr) {
 	case KVM_S390_VM_MEM_ENABLE_CMMA:
 		ret = -ENXIO;
@@ -969,7 +971,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 			break;
 
 		VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
-		mutex_lock(&kvm->lock);
 		if (kvm->created_vcpus)
 			ret = -EBUSY;
 		else {
@@ -978,7 +979,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 			kvm->arch.use_pfmfi = 0;
 			ret = 0;
 		}
-		mutex_unlock(&kvm->lock);
 		break;
 	case KVM_S390_VM_MEM_CLR_CMMA: {
 		gfn_t start_gfn = 0;
@@ -992,7 +992,8 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 
 		VM_EVENT(kvm, 3, "%s", "RESET: CMMA states");
 		do {
-			start_gfn = dat_reset_cmma(kvm->arch.gmap->asce, start_gfn);
+			scoped_guard(read_lock, &kvm->mmu_lock)
+				start_gfn = dat_reset_cmma(kvm->arch.gmap->asce, start_gfn);
 			cond_resched();
 		} while (start_gfn);
 		ret = 0;
@@ -1010,8 +1011,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 		if (get_user(new_limit, (u64 __user *)attr->addr))
 			return -EFAULT;
 
-		guard(mutex)(&kvm->lock);
-
 		new_limit = ALIGN(new_limit, HPAGE_SIZE);
 		if (kvm->arch.mem_limit != KVM_S390_NO_MEM_LIMIT &&
 		    new_limit > kvm->arch.mem_limit)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (5 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:27   ` sashiko-bot
  2026-06-22 16:07 ` [PATCH v5 8/9] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 9/9] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
  8 siblings, 1 reply; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

It is possible that some guest memory areas have not been touched yet
when starting migration mode, and thus have no ptes allocated. Only
existing and allocated ptes should count toward the total of dirty cmma
entries.

When starting migration mode, count how many pages actually have a pte
(and PGSTE), instead of blindly counting the number of pages in all
memslots.

Also fix dat_get_cmma() to properly wrap around if the first attempt
reached the end of guest memory without finding cmma-dirty pages.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/dat.c      |  3 +++
 arch/s390/kvm/gmap.c     | 11 +++++++++--
 arch/s390/kvm/kvm-s390.c |  8 +++-----
 arch/s390/kvm/priv.c     |  2 +-
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
index cffac7782c4b..0ad4ebc80eba 100644
--- a/arch/s390/kvm/dat.c
+++ b/arch/s390/kvm/dat.c
@@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
 	};
 
 	_dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);
+	/* If no dirty pages were found, wrap around and continue searching */
+	if (*start && state.start == -1)
+		_dat_walk_gfn_range(0, *start, asce, &ops, DAT_WALK_IGN_HOLES, &state);
 
 	if (state.start == -1) {
 		*count = 0;
diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index e6e786811db8..a9a37fdeb809 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -1075,7 +1075,13 @@ int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_t p_gf
 
 static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct dat_walk *walk)
 {
-	__atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val);
+	union pgste pgste;
+
+	pgste = pgste_get_lock(ptep);
+	pgste.cmma_d = 1;
+	pgste_set_unlock(ptep, pgste);
+	atomic64_inc(walk->priv);
+
 	if (need_resched())
 		return next;
 	return 0;
@@ -1089,7 +1095,8 @@ void gmap_set_cmma_all_dirty(struct gmap *gmap)
 	do {
 		scoped_guard(read_lock, &gmap->kvm->mmu_lock)
 			gfn = _dat_walk_gfn_range(gfn, asce_end(gmap->asce), gmap->asce, &ops,
-						  DAT_WALK_IGN_HOLES, NULL);
+						  DAT_WALK_IGN_HOLES,
+						  &gmap->kvm->arch.cmma_dirty_pages);
 		cond_resched();
 	} while (gfn);
 }
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 93141a68e0dd..cdd8b41d24ed 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1190,7 +1190,6 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
 {
 	struct kvm_memory_slot *ms;
 	struct kvm_memslots *slots;
-	unsigned long ram_pages = 0;
 	int bkt;
 
 	/* migration mode already enabled */
@@ -1207,12 +1206,11 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
 	kvm_for_each_memslot(ms, bkt, slots) {
 		if (!ms->dirty_bitmap)
 			return -EINVAL;
-		ram_pages += ms->npages;
 	}
 	/* mark all the pages as dirty */
+	atomic64_set(&kvm->arch.cmma_dirty_pages, 0);
 	gmap_set_cmma_all_dirty(kvm->arch.gmap);
-	atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
-	kvm->arch.migration_mode = 1;
+	WRITE_ONCE(kvm->arch.migration_mode, 1);
 	kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
 	return 0;
 }
@@ -1226,7 +1224,7 @@ static int kvm_s390_vm_stop_migration(struct kvm *kvm)
 	/* migration mode already disabled */
 	if (!kvm->arch.migration_mode)
 		return 0;
-	kvm->arch.migration_mode = 0;
+	WRITE_ONCE(kvm->arch.migration_mode, 0);
 	if (kvm->arch.use_cmma)
 		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
 	return 0;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9bc6fd02ff77..ad0ddc433a73 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -1236,7 +1236,7 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 						: ESSA_SET_STABLE_IF_RESIDENT))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-	if (!vcpu->kvm->arch.migration_mode) {
+	if (!READ_ONCE(vcpu->kvm->arch.migration_mode)) {
 		/*
 		 * CMMA is enabled in the KVM settings, but is disabled in
 		 * the SIE block and in the mm_context, and we are not doing
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 8/9] KVM: s390: selftests: Fix cmma selftest
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (6 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  2026-06-22 16:07 ` [PATCH v5 9/9] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
  8 siblings, 0 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

The existing cmma selftest depended on the host allocating page tables
for all present memslots. Since the gmap rewrite, memory that is not
accessed by the guest might not have page tables allocated yet.

This caused the test to fail due to a mismatch in the assertion.

Fix by having the guest access also the second half of the test
memslot, thus guaranteeing that its page tables are present.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
---
 tools/testing/selftests/kvm/s390/cmma_test.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/kvm/s390/cmma_test.c b/tools/testing/selftests/kvm/s390/cmma_test.c
index e39a724fe860..15d81b2ed7ad 100644
--- a/tools/testing/selftests/kvm/s390/cmma_test.c
+++ b/tools/testing/selftests/kvm/s390/cmma_test.c
@@ -34,16 +34,22 @@ static char cmma_value_buf[MAIN_PAGE_COUNT + TEST_DATA_PAGE_COUNT];
 /**
  * Dirty CMMA attributes of exactly one page in the TEST_DATA memslot,
  * so use_cmma goes on and the CMMA related ioctls do something.
+ * Touch the page at offset 1M inside TEST_DATA to make sure its page
+ * tables are allocated in the host.
  */
 static void guest_do_one_essa(void)
 {
 	asm volatile(
 		/* load TEST_DATA_START_GFN into r1 */
+		"	xgr 1,1\n"
 		"	llilf 1,%[start_gfn]\n"
 		/* calculate the address from the gfn */
 		"	sllg 1,1,12(0)\n"
 		/* set the first page in TEST_DATA memslot to STABLE */
 		"	.insn rrf,0xb9ab0000,2,1,1,0\n"
+		"	agfi 1,0x100000\n"
+		/* also touch the first page of the second MB of TEST_DATA */
+		"	.insn rrf,0xb9ab0000,2,1,1,0\n"
 		/* hypercall */
 		"	diag 0,0,0x501\n"
 		"0:	j 0b"
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 9/9] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits()
  2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
                   ` (7 preceding siblings ...)
  2026-06-22 16:07 ` [PATCH v5 8/9] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
@ 2026-06-22 16:07 ` Claudio Imbrenda
  8 siblings, 0 replies; 16+ messages in thread
From: Claudio Imbrenda @ 2026-06-22 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

If the allocation of the bits array failed, kvm_s390_set_cmma_bits()
would return 0 instead of an error code.

Rework the function to use the __free() macros and thus simplify the
code flow; when the above mentioned allocation fails, simply return
-ENOMEM.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index cdd8b41d24ed..27d6004132d2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2282,8 +2282,8 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
 static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 				  const struct kvm_s390_cmma_log *args)
 {
-	struct kvm_s390_mmu_cache *mc;
-	u8 *bits = NULL;
+	struct kvm_s390_mmu_cache *mc __free(kvm_s390_mmu_cache) = NULL;
+	u8 *bits __free(kvfree) = NULL;
 	int r = 0;
 
 	if (!kvm->arch.use_cmma)
@@ -2303,18 +2303,16 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 		return -ENOMEM;
 	bits = vmalloc(array_size(sizeof(*bits), args->count));
 	if (!bits)
-		goto out;
+		return -ENOMEM;
 
 	r = copy_from_user(bits, (void __user *)args->values, args->count);
-	if (r) {
-		r = -EFAULT;
-		goto out;
-	}
+	if (r)
+		return -EFAULT;
 
 	do {
 		r = kvm_s390_mmu_cache_topup(mc);
 		if (r)
-			break;
+			return r;
 		scoped_guard(read_lock, &kvm->mmu_lock) {
 			r = dat_set_cmma_bits(mc, kvm->arch.gmap->asce, args->start_gfn,
 					      args->count, args->mask, bits);
@@ -2322,9 +2320,7 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 	} while (r == -ENOMEM);
 
 	set_bit(GMAP_FLAG_USES_CMM, &kvm->arch.gmap->flags);
-out:
-	kvm_s390_free_mmu_cache(mc);
-	vfree(bits);
+
 	return r;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow
  2026-06-22 16:07 ` [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
@ 2026-06-22 16:18   ` Christian Borntraeger
  2026-06-22 16:23   ` sashiko-bot
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2026-06-22 16:18 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-kernel
  Cc: kvm, linux-s390, frankja, david, seiden, nrb, schlameuss, gra,
	hca, gerald.schaefer, gor, agordeev, svens



Am 22.06.26 um 18:07 schrieb Claudio Imbrenda:
> If userspace passes a start address that is out of bounds,
> _dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
> touched and will stay 0. This will cause *count to underflow and report
> a very high number, and the function will end up erroneously reporting
> success.
> 
> Fix by only setting *count if the end address is not smaller than the
> starting address. This way invalid starting addresses will correctly
> return -EFAULT and *count will correctly indicate that no values have
> been returned.
> 
> Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA")
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>

> ---
>   arch/s390/kvm/dat.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index 4a41c0247ffa..cffac7782c4b 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1209,7 +1209,7 @@ int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values)
>   	int rc;
>   
>   	rc = _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DEFAULT, &state);
> -	*count = state.end - start;
> +	*count = state.end >= start ? state.end - start : 0;
>   	/* Return success if at least one value was saved, otherwise an error. */
>   	return (rc == -EFAULT && *count > 0) ? 0 : rc;
>   }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control()
  2026-06-22 16:07 ` [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
@ 2026-06-22 16:19   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-22 16:19 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Vasily Gorbik, kvm, Heiko Carstens, Alexander Gordeev, linux-s390

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Holding `kvm->lock` across `get_user()` introduces a userfaultfd deadlock risk and denial of service vector.
--

commit 86b3447d27dc01346bae78704ae167d2c9133c5c
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix locking in kvm_s390_set_mem_control()

This commit refactors kvm_s390_set_mem_control() by moving the kvm->lock
acquisition to the top of the function using a scope-based guard, and adds
missing locking around dat_reset_cmma().

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -962,6 +962,8 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
>  {
>  	int ret;
>  
> +	guard(mutex)(&kvm->lock);
> +
>  	switch (attr->attr) {

[ ... ]

>  	case KVM_S390_VM_MEM_LIMIT_SIZE: {
>  		struct kvm_memslots *slots;
>  		struct kvm_memory_slot *ms;
>  		unsigned long new_limit;
>  		int bkt;
>  
>  		if (kvm_is_ucontrol(kvm))
>  			return -EINVAL;
>  
>  		if (get_user(new_limit, (u64 __user *)attr->addr))
>  			return -EFAULT;
>  
> -		guard(mutex)(&kvm->lock);
> -

[Severity: High]
Does this new lock placement introduce a userfaultfd deadlock risk?

By moving the guard(mutex)(&kvm->lock) to the top of the function, the lock
is now held during the get_user() call in the KVM_S390_VM_MEM_LIMIT_SIZE case.

If a userspace application backs the attr->addr pointer with a userfaultfd
region, the kernel page fault triggered by get_user() will block waiting for
the userspace handler to resolve it.

If that userspace handler then attempts to execute another KVM ioctl that
acquires kvm->lock (such as KVM_SET_USER_MEMORY_REGION), could this result
in an ABBA deadlock that hangs the kernel thread and permanently locks the
KVM instance?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622160739.202044-1-imbrenda@linux.ibm.com?part=6

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow
  2026-06-22 16:07 ` [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
  2026-06-22 16:18   ` Christian Borntraeger
@ 2026-06-22 16:23   ` sashiko-bot
  1 sibling, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-22 16:23 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Heiko Carstens, kvm, Alexander Gordeev, Vasily Gorbik, linux-s390

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Uninitialized kernel memory is leaked to userspace during CMMA peek and get operations due to gaps left in the vmalloc'd buffer.
--

commit 7a3f41eb2c82217db779d886fe0375b646e0440a
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix dat_peek_cmma() overflow

This commit addresses an integer underflow in dat_peek_cmma() where an
out-of-bounds start address returns -EFAULT but leaves state.end at 0. It
corrects the behavior by ensuring the returned count is only updated if
the end address is greater than or equal to the starting address.

> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index 4a41c0247ffa2..cffac7782c4bf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1209,7 +1209,7 @@ int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values)
>  	int rc;
>  
>  	rc = _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DEFAULT, &state);
> -	*count = state.end - start;
> +	*count = state.end >= start ? state.end - start : 0;

[Severity: High]
This is a pre-existing issue and was not introduced by this patch, but does
this leave uninitialized kernel memory in the values array that gets copied
to userspace?

Looking at kvm_s390_get_cmma_bits() in arch/s390/kvm/kvm-s390.c, the values
buffer is allocated with vmalloc() instead of vzalloc():

kvm_s390_get_cmma_bits() {
    ...
    values = vmalloc(args->count);
    ...
}

During the page table walk, if __dat_peek_cmma_crste() encounters an
invalid PMD, it advances state->end without populating state->values:

__dat_peek_cmma_crste() {
    ...
    state->end = min(walk->end, next);
    ...
}

Similarly, in __dat_get_cmma_pte(), clean pages are skipped without writing
to the values buffer:

__dat_get_cmma_pte() {
    ...
    if (!READ_ONCE(*pgste_of(ptep)).cmma_d)
        return 0;
    ...
}

Because the count returned by dat_peek_cmma() and dat_get_cmma() represents
the span from the start to state.end, any unpopulated gaps in the values
array will contain whatever uninitialized kernel heap memory vmalloc()
provided.

When kvm_s390_get_cmma_bits() later copies this buffer:

kvm_s390_get_cmma_bits() {
    ...
    if (copy_to_user((void __user *)args->values, values, args->count))
    ...
}

Can these gaps leak uninitialized memory to userspace? Should the buffer be
allocated with vzalloc() to ensure any gaps left by the walk are safely
zeroed out?

>  	/* Return success if at least one value was saved, otherwise an error. */
>  	return (rc == -EFAULT && *count > 0) ? 0 : rc;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622160739.202044-1-imbrenda@linux.ibm.com?part=2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level()
  2026-06-22 16:07 ` [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
@ 2026-06-22 16:24   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-22 16:24 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, linux-s390, Heiko Carstens, Vasily Gorbik, Alexander Gordeev

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Preexisting ABBA deadlock between KVM `mmu_lock` and gmap `children_lock`.
--

commit d8dbe6f4197c6a65f9a60ab673cf05c59e3ecf7b
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix code typo in gmap_protect_asce_top_level()

This commit fixes a typo in gmap_protect_asce_top_level() where asce.dt
was used instead of asce.tl for the length passed to
kvm_s390_get_guest_pages().

> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> index 3192f610f6962..e6e786811db86 100644
> --- a/arch/s390/kvm/gmap.c
> +++ b/arch/s390/kvm/gmap.c
> @@ -1262,7 +1262,7 @@ static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, struct gma
>  	/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
>  	smp_rmb();
>  
> -	rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + 1, false);
> +	rc = kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.tl + 1, false);
>  	if (rc > 0)
>  		rc = -EFAULT;
>  	if (!rc)
>  		rc = _gmap_protect_asce_top_level(mc, sg, &context);

[Severity: High]
This isn't a bug introduced by this patch, but is there a preexisting ABBA
deadlock between parent->children_lock and parent->kvm->mmu_lock here?

In __gmap_protect_asce_top_level() (which handles setting up non-real-space
shadows), the code acquires sg->kvm->mmu_lock first, and then acquires
parent->children_lock:

arch/s390/kvm/gmap.c:__gmap_protect_asce_top_level() {
    guard(write_lock)(&sg->kvm->mmu_lock);
    ...
    scoped_guard(spinlock, &parent->children_lock) {
        ...
}

However, in gmap_create_shadow(), the lock ordering is reversed. It acquires
parent->children_lock first and then acquires parent->kvm->mmu_lock when
creating a real-space shadow:

arch/s390/kvm/gmap.c:gmap_create_shadow() {
    scoped_guard(spinlock, &parent->children_lock) {
        ...
        if (sg->guest_asce.r) {
            scoped_guard(write_lock, &parent->kvm->mmu_lock)
                ...
}

If two vCPUs of the same VM concurrently attempt to create a real-space
nested shadow and a regular nested shadow, could this lead to a soft lockup?

>  	if (rc)
>  		kvm_s390_release_faultin_array(sg->kvm, context.f, true);
>  	return rc;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622160739.202044-1-imbrenda@linux.ibm.com?part=4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty
  2026-06-22 16:07 ` [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
@ 2026-06-22 16:27   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-22 16:27 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Heiko Carstens, Alexander Gordeev, kvm, linux-s390, Vasily Gorbik

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Missing special page checks (`.s` flag) in `gmap_unmap` and `destroy_pages` paths lead to a host kernel panic when mapping I/O memory into Secure Execution VMs.
- [Critical] Missing special page checks in storage key and reference bit handlers allow an unprivileged guest to crash the host via addressing exceptions.
- [Critical] In `_kvm_s390_pv_make_secure()`, KVM unconditionally calls `pfn_folio()` on the faulting PFN and attempts to lock it without verifying if the PFN is backed by a valid `struct page`.
--

commit 9b9c8471e051ee12971f206fa564520a1f1fea5b
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
Subject: KVM: s390: Do not set special large pages dirty

This commit updates gmap_clear_young_crste() to avoid setting special large
pages dirty. It adds a check for the special bit before calling
folio_set_dirty().

> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> index 52d55ddea8d4f..3192f610f6962 100644
> --- a/arch/s390/kvm/gmap.c
> +++ b/arch/s390/kvm/gmap.c
> @@ -327,7 +327,7 @@ static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t end, st
>  		new.h.i = 1;
>  		new.s.fc1.y = 0;
>  		new.s.fc1.prefix_notif = 0;
> -		if (new.s.fc1.d || !new.h.p)
> +		if ((new.s.fc1.d || !new.h.p) && !new.s.fc1.s)
>  			folio_set_dirty(phys_to_folio(crste_origin_large(crste)));

[Severity: Critical]
This isn't a bug introduced by this patch, but while looking at the special bit
check added here, I noticed similar missing checks in other paths.

In arch/s390/kvm/gmap.c:_gmap_unmap_crste():

    if (old.s.fc1.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags))
        folio = phys_to_folio(crste_origin_large(old));

And _gmap_unmap_pte():

    if (ptep->s.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags))
        folio = pfn_folio(ptep->h.pfra);

Are we missing a check for the special bit here before converting the physical
address to a folio? If a Secure Execution VM uses device memory (like
VM_PFNMAP) without a backing struct page, could extracting the PFN and
calling phys_to_folio() / pfn_folio() return a garbage pointer and cause a
host panic when accessed?

Similarly, in _destroy_pages_crste():

    if (!crstep->h.fc || !crstep->s.fc1.pr)
        return 0;
    origin = crste_origin_large(*crstep);

And _destroy_pages_pte():

    if (!ptep->s.pr)
        return 0;
    __kvm_s390_pv_destroy_page(phys_to_page(pte_origin(*ptep)));

Could these paths also trigger a host panic by unconditionally calling
phys_to_page() on special pages?

[Severity: Critical]
This is a pre-existing issue, but I see a similar pattern in dat.c.

In arch/s390/kvm/dat.c:dat_reset_skeys_crste():

    if (!crstep->h.fc || !crstep->s.fc1.pr)
        return 0;
    // ...
        addr = sske_frame(addr, PAGE_DEFAULT_KEY);

In dat_reset_reference_bit():

    if (!crste.h.fc || !crste.s.fc1.pr)
        return 0;
    return page_reset_referenced(large_crste_to_phys(*crstep, gfn));

In dat_get_storage_key():

    if (!crste.h.fc || !crste.s.fc1.pr)
        return 0;
    skey->skey = page_get_storage_key(large_crste_to_phys(crste, gfn));

Does this code fail to check the special bit before executing inline
assemblies (sske, rrbe, iske) on the origin physical address?

If a guest maps I/O memory and executes these instructions, would KVM execute
them on the host side using the I/O physical address, potentially causing an
addressing exception and host panic since I/O memory does not support storage
keys?

[Severity: Critical]
This is also a pre-existing issue, but I noticed another unprotected
pfn_folio() call in pv.c.

In arch/s390/kvm/pv.c:_kvm_s390_pv_make_secure():

    folio = pfn_folio(f->pfn);
    priv->rc = -EAGAIN;
    // ...
    if (folio_trylock(folio)) {
        priv->rc = __kvm_s390_pv_make_secure(f, folio);

Is it possible for f->pfn to be a raw device PFN (where f->page is NULL) if
the fault resolves to a special device mapping like VM_PFNMAP?

If so, would calling pfn_folio() and folio_trylock() on an invalid folio
pointer lead to a host kernel panic?

>  		new.s.fc1.d = 0;
>  		new.h.p = 1;
>  	} while (!dat_crstep_xchg_atomic(crstep, crste, new, gfn, walk->asce));

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622160739.202044-1-imbrenda@linux.ibm.com?part=3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking
  2026-06-22 16:07 ` [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
@ 2026-06-22 16:27   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-22 16:27 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Alexander Gordeev, kvm, Heiko Carstens, Vasily Gorbik, linux-s390

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] A stalled VCPU thread in `handle_essa()` racing with `kvm_s390_vm_start_migration()` double-counts `cmma_dirty_pages`, causing an infinite loop in userspace.

Pre-existing issues:
- [Critical] Uninitialized kernel memory from `vmalloc` is leaked to userspace via the CMMA state buffer.
- [High] CMMA state of newly mapped pages can be silently lost during the start of migration due to a TOCTOU race with `gmap_set_cmma_all_dirty` running while `migration_mode == 0`.
--

commit d55126b5a3d84f06909a4a559b0185411825d1f2
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix cmma dirty tracking

This commit updates migration mode initialization to accurately count pages
with allocated PTEs instead of counting all pages in memslots. It also fixes
dat_get_cmma() to wrap around when searching for cmma-dirty pages.

> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index cffac7782c4bf..0ad4ebc80ebaf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
>  	};
>  
>  	_dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);
> +	/* If no dirty pages were found, wrap around and continue searching */
> +	if (*start && state.start == -1)
> +		_dat_walk_gfn_range(0, *start, asce, &ops, DAT_WALK_IGN_HOLES, &state);

[Severity: Critical]
This is a pre-existing issue, but does the values buffer leak uninitialized
kernel memory to userspace?

In kvm_s390_get_cmma_bits(), the values buffer is allocated via vmalloc()
without zeroing:

arch/s390/kvm/kvm-s390.c:kvm_s390_get_cmma_bits() {
    ...
    values = vmalloc(args->count);
    if (!values)
        return -ENOMEM;
    ...
}

When dat_get_cmma() iterates over guest memory, __dat_get_cmma_pte() skips
unmapped holes and non-dirty pages (cmma_d == 0). This leaves uninitialized
gaps in the state->values array between the first and last dirty pages in the
batch.

Since the entire array up to args->count is later copied to userspace via
copy_to_user(), could this expose sensitive heap data?

[ ... ]
> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> index e6e786811db86..a9a37fdeb809b 100644
> --- a/arch/s390/kvm/gmap.c
> +++ b/arch/s390/kvm/gmap.c
> @@ -1075,7 +1075,13 @@ int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_t p_gf
>  
>  static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct dat_walk *walk)
>  {
> -	__atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val);
> +	union pgste pgste;
> +
> +	pgste = pgste_get_lock(ptep);
> +	pgste.cmma_d = 1;
> +	pgste_set_unlock(ptep, pgste);
> +	atomic64_inc(walk->priv);

[Severity: High]
Can this cause double-counting of cmma_dirty_pages if a VCPU thread in
handle_essa() races with kvm_s390_vm_start_migration()?

If a VCPU locklessly reads migration_mode == 1 in handle_essa(), stalls, and
userspace stops then restarts migration:
kvm_s390_vm_start_migration() resets cmma_dirty_pages = 0 and starts
gmap_set_cmma_all_dirty().

The stalled VCPU thread could then resume, enter __do_essa(), set cmma_d = 1,
and increment cmma_dirty_pages. When __set_cmma_dirty_pte() reaches the same
page later, it lacks a check for !pgste.cmma_d, blindly setting cmma_d = 1
and incrementing cmma_dirty_pages again.

Would this over-counting cause userspace (like QEMU) to spin infinitely when
calling the KVM_GET_CMMA_BITS ioctl, as it decrements its remaining counter
expecting more pages than actually exist?

[ ... ]
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 93141a68e0dde..cdd8b41d24ed2 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
[ ... ]
>  	/* mark all the pages as dirty */
> +	atomic64_set(&kvm->arch.cmma_dirty_pages, 0);
>  	gmap_set_cmma_all_dirty(kvm->arch.gmap);
> -	atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> -	kvm->arch.migration_mode = 1;
> +	WRITE_ONCE(kvm->arch.migration_mode, 1);

[Severity: High]
This isn't a bug introduced by this patch, but does kvm_s390_vm_start_migration()
have a race condition because gmap_set_cmma_all_dirty() runs entirely while
migration_mode == 0?

During gmap_set_cmma_all_dirty(), if a VCPU faults in a new page at an address
already scanned by the walker, the PTE is created with cmma_d = 0. Since
migration_mode is still 0, the VCPU can execute ESSA instructions which
hardware handles natively.

This would modify the CMMA state in the PGSTE but leave cmma_d = 0. When the
migration scan finishes and sets migration_mode = 1, does the newly dirtied
page remain untracked, causing guest state corruption across live migration?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622160739.202044-1-imbrenda@linux.ibm.com?part=7

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-06-22 16:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 16:07 [PATCH v5 0/9] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
2026-06-22 16:07 ` [PATCH v5 1/9] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-22 16:07 ` [PATCH v5 2/9] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
2026-06-22 16:18   ` Christian Borntraeger
2026-06-22 16:23   ` sashiko-bot
2026-06-22 16:07 ` [PATCH v5 3/9] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-22 16:27   ` sashiko-bot
2026-06-22 16:07 ` [PATCH v5 4/9] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
2026-06-22 16:24   ` sashiko-bot
2026-06-22 16:07 ` [PATCH v5 5/9] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
2026-06-22 16:07 ` [PATCH v5 6/9] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
2026-06-22 16:19   ` sashiko-bot
2026-06-22 16:07 ` [PATCH v5 7/9] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
2026-06-22 16:27   ` sashiko-bot
2026-06-22 16:07 ` [PATCH v5 8/9] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
2026-06-22 16:07 ` [PATCH v5 9/9] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox