qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests
@ 2024-05-25 13:12 Nicholas Piggin
  2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
  To: qemu-s390x
  Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel

I don't know s390x enough to know if this is the right fix, but I
could debug the migration hangs this far at least (and the patch
fixes the condition that would previously result in a hang on the
qtest).

Also we could enable the test_ignore_shared test that seems to work
on s390x and ppc64 at least.

Thanks,
Nick

Nicholas Piggin (3):
  hw/intc/s390_flic: Migrate pending state
  tests/qtest/migration-test: enable on s390x
  tests/qtest/migration-test: Enable test_ignore_shared

 hw/intc/s390_flic.c          |  1 +
 tests/qtest/migration-test.c | 25 ++++++++-----------------
 2 files changed, 9 insertions(+), 17 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
  2024-05-26 15:53   ` David Hildenbrand
  2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
  To: qemu-s390x
  Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel

The flic pending state is not migrated, so if the machine is migrated
while an interrupt is pending, it can be lost. This shows up in
qtest migration test, an extint is pending (due to console writes?)
and the CPU waits via s390_cpu_set_psw and expects the interrupt to
wake it. However when the flic pending state is lost, s390_cpu_has_int
returns false, so s390_cpu_exec_interrupt falls through to halting
again.

Fix this by migrating pending. This prevents the qtest from hanging.
Does service_param need to be migrated? Or the IO lists?

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/intc/s390_flic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6771645699..b70cf2295a 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
     .fields = (const VMStateField[]) {
         VMSTATE_UINT8(simm, QEMUS390FLICState),
         VMSTATE_UINT8(nimm, QEMUS390FLICState),
+        VMSTATE_UINT32(pending, QEMUS390FLICState),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
  2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
  2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
  2024-05-27  5:46   ` Prasad Pandit
  2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
  2024-05-27  7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth
  3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
  To: qemu-s390x
  Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel

s390x is more stable now. Enable it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/qtest/migration-test.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 94d5057857..7987faaded 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
     migration_test_add("/migration/analyze-script", test_analyze_script);
 #endif
 
-    /*
-     * On s390x, the test seems to be touchy with TCG, perhaps due to race
-     * conditions on dirty bits, so disable it there until the problems are
-     * resolved.
-     */
-    if (g_str_equal(arch, "s390x") && !has_kvm) {
-        g_test_message("Skipping tests: s390x host with KVM is required");
-        goto test_add_done;
-    }
-
     if (is_x86) {
         migration_test_add("/migration/precopy/unix/suspend/live",
                            test_precopy_unix_suspend_live);
@@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
                            test_vcpu_dirty_limit);
     }
 
-test_add_done:
-
     ret = g_test_run();
 
     g_assert_cmpint(ret, ==, 0);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
  2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
  2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
  2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
  2024-05-27 12:42   ` Fabiano Rosas
  2024-05-27  7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth
  3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
  To: qemu-s390x
  Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel, Yury Kotov,
	Dr . David Alan Gilbert

This was said to be broken on aarch64, but if it works on others,
let's try enable it. It's already starting to bitrot...

Cc: Yury Kotov <yury-kotov@yandex-team.ru>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/qtest/migration-test.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7987faaded..2bcdc33b7c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void)
 #endif /* CONFIG_TASN1 */
 #endif /* CONFIG_GNUTLS */
 
-#if 0
-/* Currently upset on aarch64 TCG */
 static void test_ignore_shared(void)
 {
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
     QTestState *from, *to;
+    MigrateStart args = {
+        .use_shmem = true,
+    };
 
-    if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return;
     }
 
@@ -1898,7 +1899,6 @@ static void test_ignore_shared(void)
 
     test_migrate_end(from, to, true);
 }
-#endif
 
 static void *
 test_migrate_xbzrle_start(QTestState *from,
@@ -3537,7 +3537,10 @@ int main(int argc, char **argv)
 #endif /* CONFIG_TASN1 */
 #endif /* CONFIG_GNUTLS */
 
-    /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
+    if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */
+        migration_test_add("/migration/ignore_shared", test_ignore_shared);
+    }
+
 #ifndef _WIN32
     migration_test_add("/migration/precopy/fd/tcp",
                        test_migrate_precopy_fd_socket);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
@ 2024-05-26 15:53   ` David Hildenbrand
  2024-05-26 19:44     ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-05-26 15:53 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
	Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel

Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> The flic pending state is not migrated, so if the machine is migrated
> while an interrupt is pending, it can be lost. This shows up in
> qtest migration test, an extint is pending (due to console writes?)
> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> wake it. However when the flic pending state is lost, s390_cpu_has_int
> returns false, so s390_cpu_exec_interrupt falls through to halting
> again.
> 
> Fix this by migrating pending. This prevents the qtest from hanging.
> Does service_param need to be migrated? Or the IO lists?
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/intc/s390_flic.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> index 6771645699..b70cf2295a 100644
> --- a/hw/intc/s390_flic.c
> +++ b/hw/intc/s390_flic.c
> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>       .fields = (const VMStateField[]) {
>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>           VMSTATE_END_OF_LIST()
>       }
>   };

Likely you have to handle this using QEMU compat machines.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-26 15:53   ` David Hildenbrand
@ 2024-05-26 19:44     ` Richard Henderson
  2024-05-26 20:33       ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2024-05-26 19:44 UTC (permalink / raw)
  To: David Hildenbrand, Nicholas Piggin, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
	Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel

On 5/26/24 08:53, David Hildenbrand wrote:
> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>> The flic pending state is not migrated, so if the machine is migrated
>> while an interrupt is pending, it can be lost. This shows up in
>> qtest migration test, an extint is pending (due to console writes?)
>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>> returns false, so s390_cpu_exec_interrupt falls through to halting
>> again.
>>
>> Fix this by migrating pending. This prevents the qtest from hanging.
>> Does service_param need to be migrated? Or the IO lists?
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   hw/intc/s390_flic.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>> index 6771645699..b70cf2295a 100644
>> --- a/hw/intc/s390_flic.c
>> +++ b/hw/intc/s390_flic.c
>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>       .fields = (const VMStateField[]) {
>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>           VMSTATE_END_OF_LIST()
>>       }
>>   };
> 
> Likely you have to handle this using QEMU compat machines.

Well, since existing migration is broken, I don't think you have to preserve 
compatibility.  But you do have to bump the version number.


r~



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-26 19:44     ` Richard Henderson
@ 2024-05-26 20:33       ` David Hildenbrand
  2024-05-27  5:51         ` Thomas Huth
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-05-26 20:33 UTC (permalink / raw)
  To: Richard Henderson, Nicholas Piggin, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
	Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel

Am 26.05.24 um 21:44 schrieb Richard Henderson:
> On 5/26/24 08:53, David Hildenbrand wrote:
>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>> The flic pending state is not migrated, so if the machine is migrated
>>> while an interrupt is pending, it can be lost. This shows up in
>>> qtest migration test, an extint is pending (due to console writes?)
>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>> again.
>>>
>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>> Does service_param need to be migrated? Or the IO lists?
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>   hw/intc/s390_flic.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>> index 6771645699..b70cf2295a 100644
>>> --- a/hw/intc/s390_flic.c
>>> +++ b/hw/intc/s390_flic.c
>>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>>       .fields = (const VMStateField[]) {
>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>           VMSTATE_END_OF_LIST()
>>>       }
>>>   };
>>
>> Likely you have to handle this using QEMU compat machines.
> 
> Well, since existing migration is broken, I don't think you have to preserve 

Migration is broken only in some case "while an interrupt is pending, it can be 
lost".

> compatibility.  But you do have to bump the version number.

Looking at it, this is TCG only, so likely we don't care that much about 
migration compatibility. But I have no idea what level of compatibility we want 
to support there.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
  2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
@ 2024-05-27  5:46   ` Prasad Pandit
  2024-05-27  7:04     ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Prasad Pandit @ 2024-05-27  5:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
	Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, qemu-devel

Hi,

On Sat, 25 May 2024 at 18:44, Nicholas Piggin <npiggin@gmail.com> wrote:
> s390x is more stable now. Enable it.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  tests/qtest/migration-test.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 94d5057857..7987faaded 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
>      migration_test_add("/migration/analyze-script", test_analyze_script);
>  #endif
>
> -    /*
> -     * On s390x, the test seems to be touchy with TCG, perhaps due to race
> -     * conditions on dirty bits, so disable it there until the problems are
> -     * resolved.
> -     */

    -> https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg04774.html

* Above patch (not reviewed yet) adds comment about sporadic problems
on s390x, and this patch says s390x is stable now? It'll help to
mention in the commit log - what changed to make it stable in 1 day.

* IIUC, this and the ppc64 patch above enable 'migration-test' for
s390x and ppc64 platforms, when KVM is not available for them? ie.
When running s390x & ppc64 migration-tests with TCG on non s390x or
non-ppc64 machines, right? Maybe the commit message could say
something to the effect of - enable s390x and ppc64 'migration-test'
to run with TCG across platforms where KVM for s390x  OR  KVM for
ppc64 is not available.

> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> -        g_test_message("Skipping tests: s390x host with KVM is required");
> -        goto test_add_done;
> -    }
> -
>      if (is_x86) {
>          migration_test_add("/migration/precopy/unix/suspend/live",
>                             test_precopy_unix_suspend_live);
> @@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
>                             test_vcpu_dirty_limit);
>      }
>
> -test_add_done:
> -
>      ret = g_test_run();
>
>      g_assert_cmpint(ret, ==, 0);
> --

* Otherwise, the change looks okay to enable 'migration-test' for
s390x with TCG IIUC.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-26 20:33       ` David Hildenbrand
@ 2024-05-27  5:51         ` Thomas Huth
  2024-05-27  7:11           ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Huth @ 2024-05-27  5:51 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, Nicholas Piggin, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel

On 26/05/2024 22.33, David Hildenbrand wrote:
> Am 26.05.24 um 21:44 schrieb Richard Henderson:
>> On 5/26/24 08:53, David Hildenbrand wrote:
>>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>>> The flic pending state is not migrated, so if the machine is migrated
>>>> while an interrupt is pending, it can be lost. This shows up in
>>>> qtest migration test, an extint is pending (due to console writes?)
>>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>>> again.
>>>>
>>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>>> Does service_param need to be migrated? Or the IO lists?
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>>   hw/intc/s390_flic.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>>> index 6771645699..b70cf2295a 100644
>>>> --- a/hw/intc/s390_flic.c
>>>> +++ b/hw/intc/s390_flic.c
>>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
>>>> qemu_s390_flic_vmstate = {
>>>>       .fields = (const VMStateField[]) {
>>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>>           VMSTATE_END_OF_LIST()
>>>>       }
>>>>   };
>>>
>>> Likely you have to handle this using QEMU compat machines.
>>
>> Well, since existing migration is broken, I don't think you have to preserve 
> 
> Migration is broken only in some case "while an interrupt is pending, it can 
> be lost".
> 
>> compatibility.  But you do have to bump the version number.
> 
> Looking at it, this is TCG only, so likely we don't care that much about 
> migration compatibility. But I have no idea what level of compatibility we 
> want to support there.

Yes, this seems to only affect the TCG-only flic device, where migration has 
never been working very reliably. So I think we don't really need the whole 
compat-machine dance here. But I think we should at least bump the 
version_id to 2 now and then use

    VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);

for the new field. That way we would at least support forward migrations 
without too much hassle.

  Thomas



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
  2024-05-27  5:46   ` Prasad Pandit
@ 2024-05-27  7:04     ` Nicholas Piggin
  2024-05-27  7:40       ` Prasad Pandit
  0 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-27  7:04 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
	Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, qemu-devel

On Mon May 27, 2024 at 3:46 PM AEST, Prasad Pandit wrote:
> Hi,
>
> On Sat, 25 May 2024 at 18:44, Nicholas Piggin <npiggin@gmail.com> wrote:
> > s390x is more stable now. Enable it.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  tests/qtest/migration-test.c | 12 ------------
> >  1 file changed, 12 deletions(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 94d5057857..7987faaded 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
> >      migration_test_add("/migration/analyze-script", test_analyze_script);
> >  #endif
> >
> > -    /*
> > -     * On s390x, the test seems to be touchy with TCG, perhaps due to race
> > -     * conditions on dirty bits, so disable it there until the problems are
> > -     * resolved.
> > -     */
>
>     -> https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg04774.html
>
> * Above patch (not reviewed yet) adds comment about sporadic problems
> on s390x, and this patch says s390x is stable now? It'll help to
> mention in the commit log - what changed to make it stable in 1 day.

Patch 1 of this series.

> * IIUC, this and the ppc64 patch above enable 'migration-test' for
> s390x and ppc64 platforms, when KVM is not available for them? ie.
> When running s390x & ppc64 migration-tests with TCG on non s390x or
> non-ppc64 machines, right? Maybe the commit message could say
> something to the effect of - enable s390x and ppc64 'migration-test'
> to run with TCG across platforms where KVM for s390x  OR  KVM for
> ppc64 is not available.

Yes they should be called "enable for TCG" indeed.

> > -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> > -        g_test_message("Skipping tests: s390x host with KVM is required");
> > -        goto test_add_done;
> > -    }
> > -
> >      if (is_x86) {
> >          migration_test_add("/migration/precopy/unix/suspend/live",
> >                             test_precopy_unix_suspend_live);
> > @@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
> >                             test_vcpu_dirty_limit);
> >      }
> >
> > -test_add_done:
> > -
> >      ret = g_test_run();
> >
> >      g_assert_cmpint(ret, ==, 0);
> > --
>
> * Otherwise, the change looks okay to enable 'migration-test' for
> s390x with TCG IIUC.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
  2024-05-27  5:51         ` Thomas Huth
@ 2024-05-27  7:11           ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-27  7:11 UTC (permalink / raw)
  To: Thomas Huth, David Hildenbrand, Richard Henderson, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, Peter Xu, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini, qemu-devel

On Mon May 27, 2024 at 3:51 PM AEST, Thomas Huth wrote:
> On 26/05/2024 22.33, David Hildenbrand wrote:
> > Am 26.05.24 um 21:44 schrieb Richard Henderson:
> >> On 5/26/24 08:53, David Hildenbrand wrote:
> >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> >>>> The flic pending state is not migrated, so if the machine is migrated
> >>>> while an interrupt is pending, it can be lost. This shows up in
> >>>> qtest migration test, an extint is pending (due to console writes?)
> >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
> >>>> returns false, so s390_cpu_exec_interrupt falls through to halting
> >>>> again.
> >>>>
> >>>> Fix this by migrating pending. This prevents the qtest from hanging.
> >>>> Does service_param need to be migrated? Or the IO lists?
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>>   hw/intc/s390_flic.c | 1 +
> >>>>   1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> >>>> index 6771645699..b70cf2295a 100644
> >>>> --- a/hw/intc/s390_flic.c
> >>>> +++ b/hw/intc/s390_flic.c
> >>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
> >>>> qemu_s390_flic_vmstate = {
> >>>>       .fields = (const VMStateField[]) {
> >>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
> >>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> >>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
> >>>>           VMSTATE_END_OF_LIST()
> >>>>       }
> >>>>   };
> >>>
> >>> Likely you have to handle this using QEMU compat machines.
> >>
> >> Well, since existing migration is broken, I don't think you have to preserve 
> > 
> > Migration is broken only in some case "while an interrupt is pending, it can 
> > be lost".
> > 
> >> compatibility.  But you do have to bump the version number.
> > 
> > Looking at it, this is TCG only, so likely we don't care that much about 
> > migration compatibility. But I have no idea what level of compatibility we 
> > want to support there.
>
> Yes, this seems to only affect the TCG-only flic device, where migration has 
> never been working very reliably. So I think we don't really need the whole 
> compat-machine dance here. But I think we should at least bump the 
> version_id to 2 now and then use
>
>     VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);
>
> for the new field. That way we would at least support forward migrations 
> without too much hassle.

Well if you could rebuild this state by checking sources or something
it might be possible to avoid. Or if you could always mark pending and
software can tolerate superflous. But that also seems like a lot of
headache for something that was always flaky.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests
  2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
                   ` (2 preceding siblings ...)
  2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
@ 2024-05-27  7:13 ` Thomas Huth
  3 siblings, 0 replies; 17+ messages in thread
From: Thomas Huth @ 2024-05-27  7:13 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-s390x
  Cc: Halil Pasic, Christian Borntraeger, David Hildenbrand, Peter Xu,
	Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel

On 25/05/2024 15.12, Nicholas Piggin wrote:
> I don't know s390x enough to know if this is the right fix, but I
> could debug the migration hangs this far at least (and the patch
> fixes the condition that would previously result in a hang on the
> qtest).

I've run this now for a while, and the hang seems to be gone, indeed!
Pretty cool, thanks for digging into this!

Tested-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
  2024-05-27  7:04     ` Nicholas Piggin
@ 2024-05-27  7:40       ` Prasad Pandit
  0 siblings, 0 replies; 17+ messages in thread
From: Prasad Pandit @ 2024-05-27  7:40 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
	Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, qemu-devel

On Mon, 27 May 2024 at 12:34, Nicholas Piggin <npiggin@gmail.com> wrote:
> > * Above patch (not reviewed yet) adds comment about sporadic problems
> > on s390x, and this patch says s390x is stable now? It'll help to
> > mention in the commit log - what changed to make it stable in 1 day.
>
> Patch 1 of this series.

* Ah, that one got filtered to another folder. It'll help to add
something about the 'flic pending' case being specific to migration of
TCG device and so it's reasonable to remove 'has_kvm' check and enable
migration-test to run with TCG.

> Yes they should be called "enable for TCG" indeed.
>

Ack with commit message changes:
  Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
  2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
@ 2024-05-27 12:42   ` Fabiano Rosas
  2024-05-27 14:56     ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Fabiano Rosas @ 2024-05-27 12:42 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-s390x
  Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Peter Xu, Laurent Vivier,
	Paolo Bonzini, qemu-devel, Yury Kotov, Dr . David Alan Gilbert

Nicholas Piggin <npiggin@gmail.com> writes:

> This was said to be broken on aarch64, but if it works on others,
> let's try enable it. It's already starting to bitrot...

Yeah, look at the state of this...

I don't know what the issue was on aarch64, but I'm all for enabling
this test globally and then we deal with the breakage if it ever
comes. I don't think it will.

However, there is an issue here still on all archs - which might very
well have been the original issue - which is the fact that the
containers on the Gitlab CI have limits on shared memory usage.
Unfortunately we cannot enable this test for the CI, so it needs a check
on the GITLAB_CI environment variable.

There's also the cpr-reboot test which got put under "flaky", that has
the same issue. That one should also have been under GITLAB_CI. From
that discussion:

  "We have an issue with this test on CI:
  
  $ df -h /dev/shm
  Filesystem      Size  Used Avail Use% Mounted on
  shm              64M     0   64M   0% /dev/shm
  
  These are shared CI runners, so AFAICT there's no way to increase the
  shared memory size.
  
  Reducing the memory for this single test also wouldn't work because we
  can run migration-test for different archs in parallel + there's the
  ivshmem_test which uses 4M.
  
  Maybe just leave it out of CI? Laptops will probably have enough shared
  memory to not hit this. If we add a warning comment to the test, might
  be enough." -- https://lore.kernel.org/all/87ttq5fvh7.fsf@suse.de

>
> Cc: Yury Kotov <yury-kotov@yandex-team.ru>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  tests/qtest/migration-test.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7987faaded..2bcdc33b7c 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -#if 0
> -/* Currently upset on aarch64 TCG */
>  static void test_ignore_shared(void)
>  {
>      g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>      QTestState *from, *to;
> +    MigrateStart args = {
> +        .use_shmem = true,
> +    };
>  
> -    if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) {
> +    if (test_migrate_start(&from, &to, uri, &args)) {
>          return;
>      }
>  
> @@ -1898,7 +1899,6 @@ static void test_ignore_shared(void)
>  
>      test_migrate_end(from, to, true);
>  }
> -#endif
>  
>  static void *
>  test_migrate_xbzrle_start(QTestState *from,
> @@ -3537,7 +3537,10 @@ int main(int argc, char **argv)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -    /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
> +    if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */
> +        migration_test_add("/migration/ignore_shared", test_ignore_shared);
> +    }
> +
>  #ifndef _WIN32
>      migration_test_add("/migration/precopy/fd/tcp",
>                         test_migrate_precopy_fd_socket);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
  2024-05-27 12:42   ` Fabiano Rosas
@ 2024-05-27 14:56     ` Peter Xu
  2024-05-27 15:11       ` Fabiano Rosas
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2024-05-27 14:56 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
	qemu-devel, Yury Kotov, Dr . David Alan Gilbert

On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
> However, there is an issue here still on all archs - which might very
> well have been the original issue - which is the fact that the
> containers on the Gitlab CI have limits on shared memory usage.
> Unfortunately we cannot enable this test for the CI, so it needs a check
> on the GITLAB_CI environment variable.

Another option is we teach migration-test to detect whether memory_size of
shmem is available, skip if not.  It can be a sequence of:

  memfd_create()
  fallocate()
  ret = madvise(MADV_POPULATE_WRITE)

To be run at the entry of migration-test, and skip all use_shmem=true tests
if ret != 0, or any step failed above.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
  2024-05-27 14:56     ` Peter Xu
@ 2024-05-27 15:11       ` Fabiano Rosas
  2024-05-27 15:41         ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Fabiano Rosas @ 2024-05-27 15:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
	qemu-devel, Yury Kotov, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
>> However, there is an issue here still on all archs - which might very
>> well have been the original issue - which is the fact that the
>> containers on the Gitlab CI have limits on shared memory usage.
>> Unfortunately we cannot enable this test for the CI, so it needs a check
>> on the GITLAB_CI environment variable.
>
> Another option is we teach migration-test to detect whether memory_size of
> shmem is available, skip if not.  It can be a sequence of:
>
>   memfd_create()
>   fallocate()
>   ret = madvise(MADV_POPULATE_WRITE)
>
> To be run at the entry of migration-test, and skip all use_shmem=true tests
> if ret != 0, or any step failed above.

There are actually two issues:

1) Trying to run a test that needs more shmem than available in the
container. This is covered well by your suggestion.

2) Trying to use some shmem while another test has already consumed all
shmem. I'm not sure if this can be done reliably as the tests run in
parallel.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
  2024-05-27 15:11       ` Fabiano Rosas
@ 2024-05-27 15:41         ` Peter Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2024-05-27 15:41 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
	David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
	qemu-devel, Yury Kotov, Dr . David Alan Gilbert

On Mon, May 27, 2024 at 12:11:45PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
> >> However, there is an issue here still on all archs - which might very
> >> well have been the original issue - which is the fact that the
> >> containers on the Gitlab CI have limits on shared memory usage.
> >> Unfortunately we cannot enable this test for the CI, so it needs a check
> >> on the GITLAB_CI environment variable.
> >
> > Another option is we teach migration-test to detect whether memory_size of
> > shmem is available, skip if not.  It can be a sequence of:
> >
> >   memfd_create()
> >   fallocate()
> >   ret = madvise(MADV_POPULATE_WRITE)
> >
> > To be run at the entry of migration-test, and skip all use_shmem=true tests
> > if ret != 0, or any step failed above.
> 
> There are actually two issues:
> 
> 1) Trying to run a test that needs more shmem than available in the
> container. This is covered well by your suggestion.
> 
> 2) Trying to use some shmem while another test has already consumed all
> shmem. I'm not sure if this can be done reliably as the tests run in
> parallel.

Maybe we can also make that check to be per-test, then when use_shmem=true
the test populates the shmem file before using, skip if population fails.
And if it succeeded, using that file in that test should be reliable.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-05-27 15:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
2024-05-26 15:53   ` David Hildenbrand
2024-05-26 19:44     ` Richard Henderson
2024-05-26 20:33       ` David Hildenbrand
2024-05-27  5:51         ` Thomas Huth
2024-05-27  7:11           ` Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
2024-05-27  5:46   ` Prasad Pandit
2024-05-27  7:04     ` Nicholas Piggin
2024-05-27  7:40       ` Prasad Pandit
2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
2024-05-27 12:42   ` Fabiano Rosas
2024-05-27 14:56     ` Peter Xu
2024-05-27 15:11       ` Fabiano Rosas
2024-05-27 15:41         ` Peter Xu
2024-05-27  7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).