public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement
@ 2025-09-30 16:33 Claudio Imbrenda
  2025-09-30 16:33 ` [GIT PULL v1 1/2] KVM: s390: improve interrupt cpu for wakeup Claudio Imbrenda
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Claudio Imbrenda @ 2025-09-30 16:33 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-s390, frankja, borntraeger, david

Ciao Paolo,

here is a small pull request that does two things:

* Improve interrupt cpu for wakeup, change the heuristic to decide wich
  vCPU to deliver a floating interrupt to.
* Clear the pte when discarding a swapped page because of CMMA; this
  bug was introduced in 6.16 when refactoring gmap code.

Unfortunately Christian had pushed his patch on -next when it was still
based on the previous release, and he wanted to keep the patch ID stable;
the branch should nonetheless merge cleanly (I tested).


The following changes since commit 57d88f02eb4449d96dfee3af4b7cd4287998bdbd:

  KVM: s390: Rework guest entry logic (2025-07-21 13:01:03 +0000)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-next-6.18-1

for you to fetch changes up to 5deafa27d9ae040b75d392f60b12e300b42b4792:

  KVM: s390: Fix to clear PTE when discarding a swapped page (2025-09-30 15:58:30 +0200)

----------------------------------------------------------------
KVM: s390: A bugfix and a performance improvement

* Improve interrupt cpu for wakeup, change the heuristic to decide wich
  vCPU to deliver a floating interrupt to.
* Clear the pte when discarding a swapped page because of CMMA; this
  bug was introduced in 6.16 when refactoring gmap code.

----------------------------------------------------------------
Christian Borntraeger (1):
      KVM: s390: improve interrupt cpu for wakeup

Gautam Gala (1):
      KVM: s390: Fix to clear PTE when discarding a swapped page

 arch/s390/include/asm/kvm_host.h |  2 +-
 arch/s390/include/asm/pgtable.h  | 22 ++++++++++++++++++++++
 arch/s390/kvm/interrupt.c        | 20 +++++++++-----------
 arch/s390/mm/gmap_helpers.c      | 12 +++++++++++-
 arch/s390/mm/pgtable.c           | 23 +----------------------
 5 files changed, 44 insertions(+), 35 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [GIT PULL v1 1/2] KVM: s390: improve interrupt cpu for wakeup
  2025-09-30 16:33 [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Claudio Imbrenda
@ 2025-09-30 16:33 ` Claudio Imbrenda
  2025-09-30 16:33 ` [GIT PULL v1 2/2] KVM: s390: Fix to clear PTE when discarding a swapped page Claudio Imbrenda
  2025-09-30 17:10 ` [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Claudio Imbrenda @ 2025-09-30 16:33 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-s390, frankja, borntraeger, david

From: Christian Borntraeger <borntraeger@linux.ibm.com>

Turns out that picking an idle CPU for floating interrupts has some
negative side effects. The guest will keep the IO workload on its CPU
and rather use an IPI from the interrupt CPU instead of moving workload.
For example a guest with 2 vCPUs and 1 fio process might run that fio on
vcpu1. If after diag500 both vCPUs are idle then vcpu0 is woken up. The
guest will then do an IPI from vcpu0 to vcpu1.

So lets change the heuristics and prefer the last CPU that went to
sleep. This one is likely still in halt polling and can be woken up
quickly.

This patch shows significant improvements in terms of bandwidth or
cpu consumption for fio and uperf workloads and seems to be a net
win.

Link: https://lore.kernel.org/linux-s390/20250904113927.119306-1-borntraeger@linux.ibm.com/
Reviewed-by: Christoph Schlameuß <schlameuss@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 +-
 arch/s390/kvm/interrupt.c        | 20 +++++++++-----------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index f870d09515cc..95d15416c39d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -356,7 +356,7 @@ struct kvm_s390_float_interrupt {
 	int counters[FIRQ_MAX_COUNT];
 	struct kvm_s390_mchk_info mchk;
 	struct kvm_s390_ext_info srv_signal;
-	int next_rr_cpu;
+	int last_sleep_cpu;
 	struct mutex ais_lock;
 	u8 simm;
 	u8 nimm;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 60c360c18690..b8e6f82e92c3 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1322,6 +1322,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 	VCPU_EVENT(vcpu, 4, "enabled wait: %llu ns", sltime);
 no_timer:
 	kvm_vcpu_srcu_read_unlock(vcpu);
+	vcpu->kvm->arch.float_int.last_sleep_cpu = vcpu->vcpu_idx;
 	kvm_vcpu_halt(vcpu);
 	vcpu->valid_wakeup = false;
 	__unset_cpu_idle(vcpu);
@@ -1948,18 +1949,15 @@ static void __floating_irq_kick(struct kvm *kvm, u64 type)
 	if (!online_vcpus)
 		return;
 
-	/* find idle VCPUs first, then round robin */
-	sigcpu = find_first_bit(kvm->arch.idle_mask, online_vcpus);
-	if (sigcpu == online_vcpus) {
-		do {
-			sigcpu = kvm->arch.float_int.next_rr_cpu++;
-			kvm->arch.float_int.next_rr_cpu %= online_vcpus;
-			/* avoid endless loops if all vcpus are stopped */
-			if (nr_tries++ >= online_vcpus)
-				return;
-		} while (is_vcpu_stopped(kvm_get_vcpu(kvm, sigcpu)));
+	for (sigcpu = kvm->arch.float_int.last_sleep_cpu; ; sigcpu++) {
+		sigcpu %= online_vcpus;
+		dst_vcpu = kvm_get_vcpu(kvm, sigcpu);
+		if (!is_vcpu_stopped(dst_vcpu))
+			break;
+		/* avoid endless loops if all vcpus are stopped */
+		if (nr_tries++ >= online_vcpus)
+			return;
 	}
-	dst_vcpu = kvm_get_vcpu(kvm, sigcpu);
 
 	/* make the VCPU drop out of the SIE, or wake it up if sleeping */
 	switch (type) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [GIT PULL v1 2/2] KVM: s390: Fix to clear PTE when discarding a swapped page
  2025-09-30 16:33 [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Claudio Imbrenda
  2025-09-30 16:33 ` [GIT PULL v1 1/2] KVM: s390: improve interrupt cpu for wakeup Claudio Imbrenda
@ 2025-09-30 16:33 ` Claudio Imbrenda
  2025-09-30 17:10 ` [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Claudio Imbrenda @ 2025-09-30 16:33 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-s390, frankja, borntraeger, david

From: Gautam Gala <ggala@linux.ibm.com>

KVM run fails when guests with 'cmm' cpu feature and host are
under memory pressure and use swap heavily. This is because
npages becomes ENOMEN (out of memory) in hva_to_pfn_slow()
which inturn propagates as EFAULT to qemu. Clearing the page
table entry when discarding an address that maps to a swap
entry resolves the issue.

Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Cc: stable@vger.kernel.org
Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Gautam Gala <ggala@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 22 ++++++++++++++++++++++
 arch/s390/mm/gmap_helpers.c     | 12 +++++++++++-
 arch/s390/mm/pgtable.c          | 23 +----------------------
 3 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 6d8bc27a366e..324f96485604 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -2010,4 +2010,26 @@ static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt)
 	return res;
 }
 
+static inline pgste_t pgste_get_lock(pte_t *ptep)
+{
+	unsigned long value = 0;
+#ifdef CONFIG_PGSTE
+	unsigned long *ptr = (unsigned long *)(ptep + PTRS_PER_PTE);
+
+	do {
+		value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
+	} while (value & PGSTE_PCL_BIT);
+	value |= PGSTE_PCL_BIT;
+#endif
+	return __pgste(value);
+}
+
+static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
+{
+#ifdef CONFIG_PGSTE
+	barrier();
+	WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~PGSTE_PCL_BIT);
+#endif
+}
+
 #endif /* _S390_PAGE_H */
diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index a45d417ad951..c382005577bd 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -13,6 +13,7 @@
 #include <linux/pagewalk.h>
 #include <linux/ksm.h>
 #include <asm/gmap_helpers.h>
+#include <asm/pgtable.h>
 
 /**
  * ptep_zap_swap_entry() - discard a swap entry.
@@ -45,6 +46,7 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr)
 {
 	struct vm_area_struct *vma;
 	spinlock_t *ptl;
+	pgste_t pgste;
 	pte_t *ptep;
 
 	mmap_assert_locked(mm);
@@ -58,8 +60,16 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr)
 	ptep = get_locked_pte(mm, vmaddr, &ptl);
 	if (unlikely(!ptep))
 		return;
-	if (pte_swap(*ptep))
+	if (pte_swap(*ptep)) {
+		preempt_disable();
+		pgste = pgste_get_lock(ptep);
+
 		ptep_zap_swap_entry(mm, pte_to_swp_entry(*ptep));
+		pte_clear(mm, vmaddr, ptep);
+
+		pgste_set_unlock(ptep, pgste);
+		preempt_enable();
+	}
 	pte_unmap_unlock(ptep, ptl);
 }
 EXPORT_SYMBOL_GPL(gmap_helper_zap_one_page);
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 7df70cd8f739..6b92c348b56f 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -23,6 +23,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/page-states.h>
+#include <asm/pgtable.h>
 #include <asm/machine.h>
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
@@ -114,28 +115,6 @@ static inline pte_t ptep_flush_lazy(struct mm_struct *mm,
 	return old;
 }
 
-static inline pgste_t pgste_get_lock(pte_t *ptep)
-{
-	unsigned long value = 0;
-#ifdef CONFIG_PGSTE
-	unsigned long *ptr = (unsigned long *)(ptep + PTRS_PER_PTE);
-
-	do {
-		value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
-	} while (value & PGSTE_PCL_BIT);
-	value |= PGSTE_PCL_BIT;
-#endif
-	return __pgste(value);
-}
-
-static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
-{
-#ifdef CONFIG_PGSTE
-	barrier();
-	WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~PGSTE_PCL_BIT);
-#endif
-}
-
 static inline pgste_t pgste_get(pte_t *ptep)
 {
 	unsigned long pgste = 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement
  2025-09-30 16:33 [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Claudio Imbrenda
  2025-09-30 16:33 ` [GIT PULL v1 1/2] KVM: s390: improve interrupt cpu for wakeup Claudio Imbrenda
  2025-09-30 16:33 ` [GIT PULL v1 2/2] KVM: s390: Fix to clear PTE when discarding a swapped page Claudio Imbrenda
@ 2025-09-30 17:10 ` Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2025-09-30 17:10 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, linux-s390, frankja, borntraeger, david

On Tue, Sep 30, 2025 at 6:34 PM Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>
> Ciao Paolo,
>
> here is a small pull request that does two things:
>
> * Improve interrupt cpu for wakeup, change the heuristic to decide wich
>   vCPU to deliver a floating interrupt to.
> * Clear the pte when discarding a swapped page because of CMMA; this
>   bug was introduced in 6.16 when refactoring gmap code.
>
> Unfortunately Christian had pushed his patch on -next when it was still
> based on the previous release, and he wanted to keep the patch ID stable;
> the branch should nonetheless merge cleanly (I tested).

No problem, it merges cleanly and has no semantic conflicts so it's fine.

Pulled, tanks.

Paolo

>
>
> The following changes since commit 57d88f02eb4449d96dfee3af4b7cd4287998bdbd:
>
>   KVM: s390: Rework guest entry logic (2025-07-21 13:01:03 +0000)
>
> are available in the Git repository at:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-next-6.18-1
>
> for you to fetch changes up to 5deafa27d9ae040b75d392f60b12e300b42b4792:
>
>   KVM: s390: Fix to clear PTE when discarding a swapped page (2025-09-30 15:58:30 +0200)
>
> ----------------------------------------------------------------
> KVM: s390: A bugfix and a performance improvement
>
> * Improve interrupt cpu for wakeup, change the heuristic to decide wich
>   vCPU to deliver a floating interrupt to.
> * Clear the pte when discarding a swapped page because of CMMA; this
>   bug was introduced in 6.16 when refactoring gmap code.
>
> ----------------------------------------------------------------
> Christian Borntraeger (1):
>       KVM: s390: improve interrupt cpu for wakeup
>
> Gautam Gala (1):
>       KVM: s390: Fix to clear PTE when discarding a swapped page
>
>  arch/s390/include/asm/kvm_host.h |  2 +-
>  arch/s390/include/asm/pgtable.h  | 22 ++++++++++++++++++++++
>  arch/s390/kvm/interrupt.c        | 20 +++++++++-----------
>  arch/s390/mm/gmap_helpers.c      | 12 +++++++++++-
>  arch/s390/mm/pgtable.c           | 23 +----------------------
>  5 files changed, 44 insertions(+), 35 deletions(-)
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-09-30 17:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30 16:33 [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Claudio Imbrenda
2025-09-30 16:33 ` [GIT PULL v1 1/2] KVM: s390: improve interrupt cpu for wakeup Claudio Imbrenda
2025-09-30 16:33 ` [GIT PULL v1 2/2] KVM: s390: Fix to clear PTE when discarding a swapped page Claudio Imbrenda
2025-09-30 17:10 ` [GIT PULL v1 0/2] KVM: s390: A bugfix and a performance improvement Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox