* [GIT PULL 1/3] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-09-09 11:46 [GIT PULL 0/3] KVM: s390: fixes for 6.17 Janosch Frank
@ 2025-09-09 11:46 ` Janosch Frank
2025-09-09 11:46 ` [GIT PULL 2/3] KVM: s390: Fix incorrect usage of mmu_notifier_register() Janosch Frank
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Janosch Frank @ 2025-09-09 11:46 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, frankja, david, borntraeger, cohuck, linux-s390, imbrenda
From: Thomas Huth <thuth@redhat.com>
When you run a KVM guest with vhost-net and migrate that guest to
another host, and you immediately enable postcopy after starting the
migration, there is a big chance that the network connection of the
guest won't work anymore on the destination side after the migration.
With a debug kernel v6.16.0, there is also a call trace that looks
like this:
FAULT_FLAG_ALLOW_RETRY missing 881
CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
Hardware name: IBM 3931 LA1 400 (LPAR)
Workqueue: events irqfd_inject [kvm]
Call Trace:
[<00003173cbecc634>] dump_stack_lvl+0x104/0x168
[<00003173cca69588>] handle_userfault+0xde8/0x1310
[<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
[<00003173cc759212>] __handle_mm_fault+0x452/0xa00
[<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
[<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
[<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
[<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
[<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
[<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
[<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
[<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
[<00003173cc00c9ec>] process_one_work+0x87c/0x1490
[<00003173cc00dda6>] worker_thread+0x7a6/0x1010
[<00003173cc02dc36>] kthread+0x3b6/0x710
[<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
[<00003173cdd737ca>] ret_from_fork+0xa/0x30
3 locks held by kworker/6:2/549:
#0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
#1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
#2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
cannot remotely resolve it (because the caller was asking for an immediate
resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
With that, get_map_page() failed and the irq was lost.
We should not be strictly in an atomic environment here and the worker
should be sleepable (the call is done during an ioctl from userspace),
so we can allow adapter_indicators_set() to just sleep waiting for the
remote fault instead.
Link: https://issues.redhat.com/browse/RHEL-42486
Signed-off-by: Peter Xu <peterx@redhat.com>
[thuth: Assembled patch description and fixed some cosmetical issues]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages")
[frankja: Added fixes tag]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
arch/s390/kvm/interrupt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 2a92a8b9e4c2..9384572ffa7b 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2778,12 +2778,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
{
+ struct mm_struct *mm = kvm->mm;
struct page *page = NULL;
+ int locked = 1;
+
+ if (mmget_not_zero(mm)) {
+ mmap_read_lock(mm);
+ get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
+ &page, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ mmput(mm);
+ }
- mmap_read_lock(kvm->mm);
- get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL);
- mmap_read_unlock(kvm->mm);
return page;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [GIT PULL 2/3] KVM: s390: Fix incorrect usage of mmu_notifier_register()
2025-09-09 11:46 [GIT PULL 0/3] KVM: s390: fixes for 6.17 Janosch Frank
2025-09-09 11:46 ` [GIT PULL 1/3] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Janosch Frank
@ 2025-09-09 11:46 ` Janosch Frank
2025-09-09 11:46 ` [GIT PULL 3/3] KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion Janosch Frank
2025-09-17 17:54 ` [GIT PULL 0/3] KVM: s390: fixes for 6.17 Paolo Bonzini
3 siblings, 0 replies; 5+ messages in thread
From: Janosch Frank @ 2025-09-09 11:46 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, frankja, david, borntraeger, cohuck, linux-s390, imbrenda
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
If mmu_notifier_register() fails, for example because a signal was
pending, the mmu_notifier will not be registered. But when the VM gets
destroyed, it will get unregistered anyway and that will cause one
extra mmdrop(), which will eventually cause the mm of the process to
be freed too early, and cause a use-after free.
This bug happens rarely, and only when secure guests are involved.
The solution is to check the return value of mmu_notifier_register()
and return it to the caller (ultimately it will be propagated all the
way to userspace). In case of -EINTR, userspace will try again.
Fixes: ca2fd0609b5d ("KVM: s390: pv: add mmu_notifier")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
arch/s390/kvm/pv.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 25ede8354514..6ba5a0305e25 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -624,6 +624,17 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
int cc, ret;
u16 dummy;
+ /* Add the notifier only once. No races because we hold kvm->lock */
+ if (kvm->arch.pv.mmu_notifier.ops != &kvm_s390_pv_mmu_notifier_ops) {
+ /* The notifier will be unregistered when the VM is destroyed */
+ kvm->arch.pv.mmu_notifier.ops = &kvm_s390_pv_mmu_notifier_ops;
+ ret = mmu_notifier_register(&kvm->arch.pv.mmu_notifier, kvm->mm);
+ if (ret) {
+ kvm->arch.pv.mmu_notifier.ops = NULL;
+ return ret;
+ }
+ }
+
ret = kvm_s390_pv_alloc_vm(kvm);
if (ret)
return ret;
@@ -659,11 +670,6 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
return -EIO;
}
kvm->arch.gmap->guest_handle = uvcb.guest_handle;
- /* Add the notifier only once. No races because we hold kvm->lock */
- if (kvm->arch.pv.mmu_notifier.ops != &kvm_s390_pv_mmu_notifier_ops) {
- kvm->arch.pv.mmu_notifier.ops = &kvm_s390_pv_mmu_notifier_ops;
- mmu_notifier_register(&kvm->arch.pv.mmu_notifier, kvm->mm);
- }
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [GIT PULL 3/3] KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
2025-09-09 11:46 [GIT PULL 0/3] KVM: s390: fixes for 6.17 Janosch Frank
2025-09-09 11:46 ` [GIT PULL 1/3] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Janosch Frank
2025-09-09 11:46 ` [GIT PULL 2/3] KVM: s390: Fix incorrect usage of mmu_notifier_register() Janosch Frank
@ 2025-09-09 11:46 ` Janosch Frank
2025-09-17 17:54 ` [GIT PULL 0/3] KVM: s390: fixes for 6.17 Paolo Bonzini
3 siblings, 0 replies; 5+ messages in thread
From: Janosch Frank @ 2025-09-09 11:46 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, frankja, david, borntraeger, cohuck, linux-s390, imbrenda
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
Pass the right type of flag to vcpu_dat_fault_handler(); it expects a
FOLL_* flag (in particular FOLL_WRITE), but FAULT_FLAG_WRITE is passed
instead.
This still works because they happen to have the same integer value,
but it's a mistake, thus the fix.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
---
arch/s390/kvm/kvm-s390.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index bf6fa8b9ca73..6d51aa5f66be 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4864,12 +4864,12 @@ static void kvm_s390_assert_primary_as(struct kvm_vcpu *vcpu)
* @vcpu: the vCPU whose gmap is to be fixed up
* @gfn: the guest frame number used for memslots (including fake memslots)
* @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps
- * @flags: FOLL_* flags
+ * @foll: FOLL_* flags
*
* Return: 0 on success, < 0 in case of error.
* Context: The mm lock must not be held before calling. May sleep.
*/
-int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, unsigned int flags)
+int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, unsigned int foll)
{
struct kvm_memory_slot *slot;
unsigned int fault_flags;
@@ -4883,13 +4883,13 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, u
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return vcpu_post_run_addressing_exception(vcpu);
- fault_flags = flags & FOLL_WRITE ? FAULT_FLAG_WRITE : 0;
+ fault_flags = foll & FOLL_WRITE ? FAULT_FLAG_WRITE : 0;
if (vcpu->arch.gmap->pfault_enabled)
- flags |= FOLL_NOWAIT;
+ foll |= FOLL_NOWAIT;
vmaddr = __gfn_to_hva_memslot(slot, gfn);
try_again:
- pfn = __kvm_faultin_pfn(slot, gfn, flags, &writable, &page);
+ pfn = __kvm_faultin_pfn(slot, gfn, foll, &writable, &page);
/* Access outside memory, inject addressing exception */
if (is_noslot_pfn(pfn))
@@ -4905,7 +4905,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, u
return 0;
vcpu->stat.pfault_sync++;
/* Could not setup async pfault, try again synchronously */
- flags &= ~FOLL_NOWAIT;
+ foll &= ~FOLL_NOWAIT;
goto try_again;
}
/* Any other error */
@@ -4925,7 +4925,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, u
return rc;
}
-static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gaddr, unsigned int flags)
+static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gaddr, unsigned int foll)
{
unsigned long gaddr_tmp;
gfn_t gfn;
@@ -4950,18 +4950,18 @@ static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gaddr, un
}
gfn = gpa_to_gfn(gaddr_tmp);
}
- return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, flags);
+ return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, foll);
}
static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu)
{
- unsigned int flags = 0;
+ unsigned int foll = 0;
unsigned long gaddr;
int rc;
gaddr = current->thread.gmap_teid.addr * PAGE_SIZE;
if (kvm_s390_cur_gmap_fault_is_write())
- flags = FAULT_FLAG_WRITE;
+ foll = FOLL_WRITE;
switch (current->thread.gmap_int_code & PGM_INT_CODE_MASK) {
case 0:
@@ -5003,7 +5003,7 @@ static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu)
send_sig(SIGSEGV, current, 0);
if (rc != -ENXIO)
break;
- flags = FAULT_FLAG_WRITE;
+ foll = FOLL_WRITE;
fallthrough;
case PGM_PROTECTION:
case PGM_SEGMENT_TRANSLATION:
@@ -5013,7 +5013,7 @@ static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu)
case PGM_REGION_SECOND_TRANS:
case PGM_REGION_THIRD_TRANS:
kvm_s390_assert_primary_as(vcpu);
- return vcpu_dat_fault_handler(vcpu, gaddr, flags);
+ return vcpu_dat_fault_handler(vcpu, gaddr, foll);
default:
KVM_BUG(1, vcpu->kvm, "Unexpected program interrupt 0x%x, TEID 0x%016lx",
current->thread.gmap_int_code, current->thread.gmap_teid.val);
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [GIT PULL 0/3] KVM: s390: fixes for 6.17
2025-09-09 11:46 [GIT PULL 0/3] KVM: s390: fixes for 6.17 Janosch Frank
` (2 preceding siblings ...)
2025-09-09 11:46 ` [GIT PULL 3/3] KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion Janosch Frank
@ 2025-09-17 17:54 ` Paolo Bonzini
3 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2025-09-17 17:54 UTC (permalink / raw)
To: Janosch Frank; +Cc: kvm, david, borntraeger, cohuck, linux-s390, imbrenda
On 9/9/25 13:46, Janosch Frank wrote:
> Paolo,
>
> here are three fixes for KVM s390. Claudio contributed mm fixes as a
> preparation for upcoming rework and Thomas fixed a postcopy fault.
>
> I've had these on master for two weeks already but there was KVM Forum
> in between so here they are based on rc7.
>
> Please pull.
Pulled, thanks.
Paolo
> Cheers,
> Janosch
>
> The following changes since commit 76eeb9b8de9880ca38696b2fb56ac45ac0a25c6c:
>
> Linux 6.17-rc5 (2025-09-07 14:22:57 -0700)
>
> are available in the Git repository at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-master-6.17-1
>
> for you to fetch changes up to 5f9df945d4e862979b50e4ecaba3dc81fb06e8ed:
>
> KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion (2025-09-09 08:17:39 +0000)
>
> ----------------------------------------------------------------
> - KVM mm fixes
> - Postcopy fix
> ----------------------------------------------------------------
>
> Claudio Imbrenda (2):
> KVM: s390: Fix incorrect usage of mmu_notifier_register()
> KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
>
> Thomas Huth (1):
> KVM: s390: Fix access to unavailable adapter indicator pages during
> postcopy
>
> arch/s390/kvm/interrupt.c | 15 +++++++++++----
> arch/s390/kvm/kvm-s390.c | 24 ++++++++++++------------
> arch/s390/kvm/pv.c | 16 +++++++++++-----
> 3 files changed, 34 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 5+ messages in thread