* [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
@ 2025-08-21 15:23 Thomas Huth
2025-08-25 7:58 ` Janosch Frank
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Thomas Huth @ 2025-08-21 15:23 UTC (permalink / raw)
To: Janosch Frank, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
From: Thomas Huth <thuth@redhat.com>
When you run a KVM guest with vhost-net and migrate that guest to
another host, and you immediately enable postcopy after starting the
migration, there is a big chance that the network connection of the
guest won't work anymore on the destination side after the migration.
With a debug kernel v6.16.0, there is also a call trace that looks
like this:
FAULT_FLAG_ALLOW_RETRY missing 881
CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
Hardware name: IBM 3931 LA1 400 (LPAR)
Workqueue: events irqfd_inject [kvm]
Call Trace:
[<00003173cbecc634>] dump_stack_lvl+0x104/0x168
[<00003173cca69588>] handle_userfault+0xde8/0x1310
[<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
[<00003173cc759212>] __handle_mm_fault+0x452/0xa00
[<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
[<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
[<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
[<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
[<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
[<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
[<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
[<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
[<00003173cc00c9ec>] process_one_work+0x87c/0x1490
[<00003173cc00dda6>] worker_thread+0x7a6/0x1010
[<00003173cc02dc36>] kthread+0x3b6/0x710
[<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
[<00003173cdd737ca>] ret_from_fork+0xa/0x30
3 locks held by kworker/6:2/549:
#0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
#1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
#2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
cannot remotely resolve it (because the caller was asking for an immediate
resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
With that, get_map_page() failed and the irq was lost.
We should not be strictly in an atomic environment here and the worker
should be sleepable (the call is done during an ioctl from userspace),
so we can allow adapter_indicators_set() to just sleep waiting for the
remote fault instead.
Link: https://issues.redhat.com/browse/RHEL-42486
Signed-off-by: Peter Xu <peterx@redhat.com>
[thuth: Assembled patch description and fixed some cosmetical issues]
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
Note: Instructions for reproducing the bug can be found in the ticket here:
https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
arch/s390/kvm/interrupt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 60c360c18690f..dcce826ae9875 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
{
+ struct mm_struct *mm = kvm->mm;
struct page *page = NULL;
+ int locked = 1;
+
+ if (mmget_not_zero(mm)) {
+ mmap_read_lock(mm);
+ get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
+ &page, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ mmput(mm);
+ }
- mmap_read_lock(kvm->mm);
- get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL);
- mmap_read_unlock(kvm->mm);
return page;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
@ 2025-08-25 7:58 ` Janosch Frank
2025-08-25 13:34 ` Claudio Imbrenda
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Janosch Frank @ 2025-08-25 7:58 UTC (permalink / raw)
To: Thomas Huth, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 8/21/25 5:23 PM, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
2025-08-25 7:58 ` Janosch Frank
@ 2025-08-25 13:34 ` Claudio Imbrenda
2025-08-26 11:43 ` Janosch Frank
2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 0 replies; 6+ messages in thread
From: Claudio Imbrenda @ 2025-08-25 13:34 UTC (permalink / raw)
To: Thomas Huth
Cc: Janosch Frank, kvm, Peter Xu, Christian Borntraeger,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Sven Schnelle, linux-s390, linux-kernel
On Thu, 21 Aug 2025 17:23:09 +0200
Thomas Huth <thuth@redhat.com> wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
> Note: Instructions for reproducing the bug can be found in the ticket here:
> https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
>
> arch/s390/kvm/interrupt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 60c360c18690f..dcce826ae9875 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>
> static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
> {
> + struct mm_struct *mm = kvm->mm;
> struct page *page = NULL;
> + int locked = 1;
> +
> + if (mmget_not_zero(mm)) {
> + mmap_read_lock(mm);
> + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
> + &page, &locked);
> + if (locked)
> + mmap_read_unlock(mm);
> + mmput(mm);
> + }
>
> - mmap_read_lock(kvm->mm);
> - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
> - &page, NULL);
> - mmap_read_unlock(kvm->mm);
> return page;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
2025-08-25 7:58 ` Janosch Frank
2025-08-25 13:34 ` Claudio Imbrenda
@ 2025-08-26 11:43 ` Janosch Frank
2025-08-26 12:16 ` Thomas Huth
2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 1 reply; 6+ messages in thread
From: Janosch Frank @ 2025-08-26 11:43 UTC (permalink / raw)
To: Thomas Huth, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 8/21/25 5:23 PM, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
Do we want to add this?
Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt
pages")
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-26 11:43 ` Janosch Frank
@ 2025-08-26 12:16 ` Thomas Huth
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Huth @ 2025-08-26 12:16 UTC (permalink / raw)
To: Janosch Frank, Claudio Imbrenda, kvm
Cc: Peter Xu, Christian Borntraeger, David Hildenbrand,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Sven Schnelle,
linux-s390, linux-kernel
On 26/08/2025 13.43, Janosch Frank wrote:
> On 8/21/25 5:23 PM, Thomas Huth wrote:
>> From: Thomas Huth <thuth@redhat.com>
>>
>> When you run a KVM guest with vhost-net and migrate that guest to
>> another host, and you immediately enable postcopy after starting the
>> migration, there is a big chance that the network connection of the
>> guest won't work anymore on the destination side after the migration.
>
> Do we want to add this?
>
> Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages")
Yes, that sounds like a good idea, please add it when picking up the patch!
Thanks,
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
` (2 preceding siblings ...)
2025-08-26 11:43 ` Janosch Frank
@ 2025-09-04 11:12 ` Christian Borntraeger
3 siblings, 0 replies; 6+ messages in thread
From: Christian Borntraeger @ 2025-09-04 11:12 UTC (permalink / raw)
To: Thomas Huth, Janosch Frank, Claudio Imbrenda, kvm
Cc: Peter Xu, David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Sven Schnelle, linux-s390, linux-kernel,
Douglas Freimuth, Matthew Rosato
CC Douglas, since Doug is looking into kvm_arch_set_irq_inatomic and this might have implications.
Am 21.08.25 um 17:23 schrieb Thomas Huth:
> From: Thomas Huth <thuth@redhat.com>
>
> When you run a KVM guest with vhost-net and migrate that guest to
> another host, and you immediately enable postcopy after starting the
> migration, there is a big chance that the network connection of the
> guest won't work anymore on the destination side after the migration.
>
> With a debug kernel v6.16.0, there is also a call trace that looks
> like this:
>
> FAULT_FLAG_ALLOW_RETRY missing 881
> CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
> Hardware name: IBM 3931 LA1 400 (LPAR)
> Workqueue: events irqfd_inject [kvm]
> Call Trace:
> [<00003173cbecc634>] dump_stack_lvl+0x104/0x168
> [<00003173cca69588>] handle_userfault+0xde8/0x1310
> [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
> [<00003173cc759212>] __handle_mm_fault+0x452/0xa00
> [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
> [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
> [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
> [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
> [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
> [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
> [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
> [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
> [<00003173cc00c9ec>] process_one_work+0x87c/0x1490
> [<00003173cc00dda6>] worker_thread+0x7a6/0x1010
> [<00003173cc02dc36>] kthread+0x3b6/0x710
> [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
> [<00003173cdd737ca>] ret_from_fork+0xa/0x30
> 3 locks held by kworker/6:2/549:
> #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
> #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
> #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
>
> The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
> saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
> cannot remotely resolve it (because the caller was asking for an immediate
> resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
> With that, get_map_page() failed and the irq was lost.
>
> We should not be strictly in an atomic environment here and the worker
> should be sleepable (the call is done during an ioctl from userspace),
> so we can allow adapter_indicators_set() to just sleep waiting for the
> remote fault instead.
>
> Link: https://issues.redhat.com/browse/RHEL-42486
> Signed-off-by: Peter Xu <peterx@redhat.com>
> [thuth: Assembled patch description and fixed some cosmetical issues]
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
> Note: Instructions for reproducing the bug can be found in the ticket here:
> https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
>
> arch/s390/kvm/interrupt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 60c360c18690f..dcce826ae9875 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>
> static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
> {
> + struct mm_struct *mm = kvm->mm;
> struct page *page = NULL;
> + int locked = 1;
> +
> + if (mmget_not_zero(mm)) {
> + mmap_read_lock(mm);
> + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
> + &page, &locked);
> + if (locked)
> + mmap_read_unlock(mm);
> + mmput(mm);
> + }
>
> - mmap_read_lock(kvm->mm);
> - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
> - &page, NULL);
> - mmap_read_unlock(kvm->mm);
> return page;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-04 11:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 15:23 [PATCH] KVM: s390: Fix access to unavailable adapter indicator pages during postcopy Thomas Huth
2025-08-25 7:58 ` Janosch Frank
2025-08-25 13:34 ` Claudio Imbrenda
2025-08-26 11:43 ` Janosch Frank
2025-08-26 12:16 ` Thomas Huth
2025-09-04 11:12 ` Christian Borntraeger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).