* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
@ 2025-08-25 7:58 ` Janosch Frank
2025-08-25 13:34 ` Claudio Imbrenda
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Janosch Frank @ 2025-08-25 7:58 UTC (permalink / raw)
To: Thomas Huth, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 8/21/25 5:23 PM, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
2025-08-25 7:58 ` Janosch Frank
@ 2025-08-25 13:34 ` Claudio Imbrenda
2025-08-26 11:43 ` Janosch Frank
2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 0 replies; 6+ messages in thread
From: Claudio Imbrenda @ 2025-08-25 13:34 UTC (permalink / raw)
To: Thomas Huth
Cc: Janosch Frank, kvm, Peter Xu, Christian Borntraeger,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Sven Schnelle, linux-s390, linux-kernel
On Thu, 21 Aug 2025 17:23:09 +0200
Thomas Huth <thuth@redhat.com> wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
> Note: Instructions for reproducing the bug can be found in the ticket here:
> https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
>
> arch/s390/kvm/interrupt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 60c360c18690f..dcce826ae9875 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>
> static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
> {
> + struct mm_struct *mm = kvm->mm;
> struct page *page = NULL;
> + int locked = 1;
> +
> + if (mmget_not_zero(mm)) {
> + mmap_read_lock(mm);
> + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
> + &page, &locked);
> + if (locked)
> + mmap_read_unlock(mm);
> + mmput(mm);
> + }
>
> - mmap_read_lock(kvm->mm);
> - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
> - &page, NULL);
> - mmap_read_unlock(kvm->mm);
> return page;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
2025-08-25 7:58 ` Janosch Frank
2025-08-25 13:34 ` Claudio Imbrenda
@ 2025-08-26 11:43 ` Janosch Frank
2025-08-26 12:16 ` Thomas Huth
2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 1 reply; 6+ messages in thread
From: Janosch Frank @ 2025-08-26 11:43 UTC (permalink / raw)
To: Thomas Huth, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 8/21/25 5:23 PM, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
Do we want to add this?
Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt
pages")
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-26 11:43 ` Janosch Frank
@ 2025-08-26 12:16 ` Thomas Huth
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Huth @ 2025-08-26 12:16 UTC (permalink / raw)
To: Janosch Frank, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 26/08/2025 13.43, Janosch Frank wrote:
> On 8/21/25 5:23 PM, Thomas Huth wrote:
>> From: Thomas Huth <thuth@redhat.com>
>>
>> When you run a KVM guest with vhost-net and migrate that guest to
>> another host, and you immediately enable postcopy after starting the
>> migration, there is a big chance that the network connection of the
>> guest won't work anymore on the destination side after the migration.
>
> Do we want to add this?
>
> Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages")
Yes, that sounds like a good idea, please add it when picking up the patch!
Thanks,
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
` (2 preceding siblings ...)
2025-08-26 11:43 ` Janosch Frank
@ 2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 0 replies; 6+ messages in thread
From: Christian Borntraeger @ 2025-09-04 11:12 UTC (permalink / raw)
To: Thomas Huth, Janosch Frank, Claudio Imbrenda, kvm
Cc: Peter Xu, David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Sven Schnelle, linux-s390, linux-kernel,
Douglas Freimuth, Matthew Rosato
CC Douglas, since Doug is looking into kvm_arch_set_irq_inatomic and this might have implications.
Am 21.08.25 um 17:23 schrieb Thomas Huth:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
> Note: Instructions for reproducing the bug can be found in the ticket here:
> https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
>
> arch/s390/kvm/interrupt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 60c360c18690f..dcce826ae9875 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>
> static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
> {
> + struct mm_struct *mm = kvm->mm;
> struct page *page = NULL;
> + int locked = 1;
> +
> + if (mmget_not_zero(mm)) {
> + mmap_read_lock(mm);
> + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
> + &page, &locked);
> + if (locked)
> + mmap_read_unlock(mm);
> + mmput(mm);
> + }
>
> - mmap_read_lock(kvm->mm);
> - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
> - &page, NULL);
> - mmap_read_unlock(kvm->mm);
> return page;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread