* [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
@ 2024-03-05 14:11 Nico Boehr
2024-03-05 18:12 ` Marc Hartmayer
2024-03-12 6:37 ` Nicholas Piggin
0 siblings, 2 replies; 6+ messages in thread
From: Nico Boehr @ 2024-03-05 14:11 UTC (permalink / raw)
To: frankja, imbrenda, thuth, npiggin; +Cc: kvm, linux-s390
Sometimes, QEMU needs a bit longer to remove the incoming migration
socket. This happens in some environments on s390x for the
migration-skey-sequential test.
Instead of directly erroring out, wait for the removal of the socket.
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
---
scripts/arch-run.bash | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
index 2214d940cf7d..413f3eda8cb8 100644
--- a/scripts/arch-run.bash
+++ b/scripts/arch-run.bash
@@ -237,12 +237,8 @@ do_migration ()
echo > ${dst_infifo}
rm ${dst_infifo}
- # Ensure the incoming socket is removed, ready for next destination
- if [ -S ${dst_incoming} ] ; then
- echo "ERROR: Incoming migration socket not removed after migration." >& 2
- qmp ${dst_qmp} '"quit"'> ${dst_qmpout} 2>/dev/null
- return 2
- fi
+ # Wait for the incoming socket being removed, ready for next destination
+ while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
wait ${live_pid}
ret=$?
--
2.44.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
2024-03-05 14:11 [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed Nico Boehr
@ 2024-03-05 18:12 ` Marc Hartmayer
2024-03-06 13:03 ` Nico Boehr
2024-03-12 6:37 ` Nicholas Piggin
1 sibling, 1 reply; 6+ messages in thread
From: Marc Hartmayer @ 2024-03-05 18:12 UTC (permalink / raw)
To: Nico Boehr, frankja, imbrenda, thuth, npiggin; +Cc: kvm, linux-s390
On Tue, Mar 05, 2024 at 03:11 PM +0100, Nico Boehr <nrb@linux.ibm.com> wrote:
> Sometimes, QEMU needs a bit longer to remove the incoming migration
> socket. This happens in some environments on s390x for the
> migration-skey-sequential test.
>
> Instead of directly erroring out, wait for the removal of the socket.
>
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
> ---
> scripts/arch-run.bash | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
> index 2214d940cf7d..413f3eda8cb8 100644
> --- a/scripts/arch-run.bash
> +++ b/scripts/arch-run.bash
> @@ -237,12 +237,8 @@ do_migration ()
> echo > ${dst_infifo}
> rm ${dst_infifo}
>
> - # Ensure the incoming socket is removed, ready for next destination
> - if [ -S ${dst_incoming} ] ; then
> - echo "ERROR: Incoming migration socket not removed after migration." >& 2
> - qmp ${dst_qmp} '"quit"'> ${dst_qmpout} 2>/dev/null
> - return 2
> - fi
> + # Wait for the incoming socket being removed, ready for next destination
> + while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
But now, you have removed the erroring out path completely. Maybe wait
max. 3s and then bail out?
[…snip]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
2024-03-05 18:12 ` Marc Hartmayer
@ 2024-03-06 13:03 ` Nico Boehr
2024-03-12 5:39 ` Nicholas Piggin
0 siblings, 1 reply; 6+ messages in thread
From: Nico Boehr @ 2024-03-06 13:03 UTC (permalink / raw)
To: Marc Hartmayer, frankja, imbrenda, npiggin, thuth; +Cc: kvm, linux-s390
Quoting Marc Hartmayer (2024-03-05 19:12:16)
[...]
> > diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
> > index 2214d940cf7d..413f3eda8cb8 100644
> > --- a/scripts/arch-run.bash
> > +++ b/scripts/arch-run.bash
> > @@ -237,12 +237,8 @@ do_migration ()
> > echo > ${dst_infifo}
> > rm ${dst_infifo}
> >
> > - # Ensure the incoming socket is removed, ready for next destination
> > - if [ -S ${dst_incoming} ] ; then
> > - echo "ERROR: Incoming migration socket not removed after migration." >& 2
> > - qmp ${dst_qmp} '"quit"'> ${dst_qmpout} 2>/dev/null
> > - return 2
> > - fi
> > + # Wait for the incoming socket being removed, ready for next destination
> > + while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
>
> But now, you have removed the erroring out path completely. Maybe wait
> max. 3s and then bail out?
Well, I was considering that, but:
- I'm not a huge fan of fine-grained timeouts. Fine-tuning a gazillion
timeouts is not a fun task, I think you know what I'm talking about :)
- a number of other places that can potentially get stuck also don't have
proper timeouts (like waiting for the QMP socket or the migration
socket), so for a proper solution we'd need to touch a lot of other
places...
What I think we really want is a migration timeout. That isn't quite simple
since we can't easily pull $(timeout_cmd) before $(panic_cmd) and
$(migration_cmd) in run-scripts...
My suggestion: let's fix this issue and work on the timeout as a seperate
fix.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
2024-03-06 13:03 ` Nico Boehr
@ 2024-03-12 5:39 ` Nicholas Piggin
0 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2024-03-12 5:39 UTC (permalink / raw)
To: Nico Boehr, Marc Hartmayer, frankja, imbrenda, thuth; +Cc: kvm, linux-s390
On Wed Mar 6, 2024 at 11:03 PM AEST, Nico Boehr wrote:
> Quoting Marc Hartmayer (2024-03-05 19:12:16)
> [...]
> > > diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
> > > index 2214d940cf7d..413f3eda8cb8 100644
> > > --- a/scripts/arch-run.bash
> > > +++ b/scripts/arch-run.bash
> > > @@ -237,12 +237,8 @@ do_migration ()
> > > echo > ${dst_infifo}
> > > rm ${dst_infifo}
> > >
> > > - # Ensure the incoming socket is removed, ready for next destination
> > > - if [ -S ${dst_incoming} ] ; then
> > > - echo "ERROR: Incoming migration socket not removed after migration." >& 2
> > > - qmp ${dst_qmp} '"quit"'> ${dst_qmpout} 2>/dev/null
> > > - return 2
> > > - fi
> > > + # Wait for the incoming socket being removed, ready for next destination
> > > + while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
> >
> > But now, you have removed the erroring out path completely. Maybe wait
> > max. 3s and then bail out?
>
> Well, I was considering that, but:
> - I'm not a huge fan of fine-grained timeouts. Fine-tuning a gazillion
> timeouts is not a fun task, I think you know what I'm talking about :)
> - a number of other places that can potentially get stuck also don't have
> proper timeouts (like waiting for the QMP socket or the migration
> socket), so for a proper solution we'd need to touch a lot of other
> places...
>
> What I think we really want is a migration timeout. That isn't quite simple
> since we can't easily pull $(timeout_cmd) before $(panic_cmd) and
> $(migration_cmd) in run-scripts...
>
> My suggestion: let's fix this issue and work on the timeout as a seperate
> fix.
The migration tests as a whole have big trouble with timeouts already.
The problem is timeouts are implemented with the 'timeout' command but
that is specific to the QEMU process so especially the migration harness
with lots of loops can easily hang.
I tried a few ways to address this like starting a background 'sleep ;
kill' shell, but that gets very complicated to handle interrupts properly
that kill the stuck bits and having the harness report the error
sanely. I'm thinking a subshell that runs the entire test case and start
*that* with 'timeout' might be a better approach.
So I agree, let's take patches that fix behaviour when there are no
timeouts, and address the timeout problem as a whole as a separate
effort rather than worrying too much about individual loops just yet.
(It's a fair review comment to ask though).
Thanks,
Nick
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
2024-03-05 14:11 [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed Nico Boehr
2024-03-05 18:12 ` Marc Hartmayer
@ 2024-03-12 6:37 ` Nicholas Piggin
2024-03-12 6:45 ` Thomas Huth
1 sibling, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2024-03-12 6:37 UTC (permalink / raw)
To: Nico Boehr, frankja, imbrenda, thuth; +Cc: kvm, linux-s390
On Wed Mar 6, 2024 at 12:11 AM AEST, Nico Boehr wrote:
> Sometimes, QEMU needs a bit longer to remove the incoming migration
> socket. This happens in some environments on s390x for the
> migration-skey-sequential test.
>
> Instead of directly erroring out, wait for the removal of the socket.
>
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
It's interesting that the incoming socket is not removed after
QMP says migration complete. I guess that's by design, but have
you checked the QEMU code whether it's intentional?
I guess it's code like this - in migration/migration.c
/*
* This must happen after any state changes since as soon as an external
* observer sees this event they might start to prod at the VM assuming
* it's ready to use.
*/
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COMPLETED);
migration_incoming_state_destroy();
So, it looks like a good fix. And probably not just s390x specific
it might be just unlucky timing.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> scripts/arch-run.bash | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
> index 2214d940cf7d..413f3eda8cb8 100644
> --- a/scripts/arch-run.bash
> +++ b/scripts/arch-run.bash
> @@ -237,12 +237,8 @@ do_migration ()
> echo > ${dst_infifo}
> rm ${dst_infifo}
>
> - # Ensure the incoming socket is removed, ready for next destination
> - if [ -S ${dst_incoming} ] ; then
> - echo "ERROR: Incoming migration socket not removed after migration." >& 2
> - qmp ${dst_qmp} '"quit"'> ${dst_qmpout} 2>/dev/null
> - return 2
> - fi
> + # Wait for the incoming socket being removed, ready for next destination
> + while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
>
> wait ${live_pid}
> ret=$?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed
2024-03-12 6:37 ` Nicholas Piggin
@ 2024-03-12 6:45 ` Thomas Huth
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Huth @ 2024-03-12 6:45 UTC (permalink / raw)
To: Nicholas Piggin, Nico Boehr, frankja, imbrenda; +Cc: kvm, linux-s390
On 12/03/2024 07.37, Nicholas Piggin wrote:
> On Wed Mar 6, 2024 at 12:11 AM AEST, Nico Boehr wrote:
>> Sometimes, QEMU needs a bit longer to remove the incoming migration
>> socket. This happens in some environments on s390x for the
>> migration-skey-sequential test.
>>
>> Instead of directly erroring out, wait for the removal of the socket.
>>
>> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
>
> It's interesting that the incoming socket is not removed after
> QMP says migration complete. I guess that's by design, but have
> you checked the QEMU code whether it's intentional?
>
> I guess it's code like this - in migration/migration.c
>
> /*
> * This must happen after any state changes since as soon as an external
> * observer sees this event they might start to prod at the VM assuming
> * it's ready to use.
> */
> migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> MIGRATION_STATUS_COMPLETED);
> migration_incoming_state_destroy();
>
> So, it looks like a good fix. And probably not just s390x specific
> it might be just unlucky timing.
>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
All right, I pushed this as a fix now to the repository.
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-03-12 6:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 14:11 [kvm-unit-tests PATCH v1] arch-run: Wait for incoming socket being removed Nico Boehr
2024-03-05 18:12 ` Marc Hartmayer
2024-03-06 13:03 ` Nico Boehr
2024-03-12 5:39 ` Nicholas Piggin
2024-03-12 6:37 ` Nicholas Piggin
2024-03-12 6:45 ` Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox