* [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests
@ 2024-05-25 13:12 Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
To: qemu-s390x
Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel
I don't know s390x enough to know if this is the right fix, but I
could debug the migration hangs this far at least (and the patch
fixes the condition that would previously result in a hang on the
qtest).
Also we could enable the test_ignore_shared test that seems to work
on s390x and ppc64 at least.
Thanks,
Nick
Nicholas Piggin (3):
hw/intc/s390_flic: Migrate pending state
tests/qtest/migration-test: enable on s390x
tests/qtest/migration-test: Enable test_ignore_shared
hw/intc/s390_flic.c | 1 +
tests/qtest/migration-test.c | 25 ++++++++-----------------
2 files changed, 9 insertions(+), 17 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
2024-05-26 15:53 ` David Hildenbrand
2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
To: qemu-s390x
Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel
The flic pending state is not migrated, so if the machine is migrated
while an interrupt is pending, it can be lost. This shows up in
qtest migration test, an extint is pending (due to console writes?)
and the CPU waits via s390_cpu_set_psw and expects the interrupt to
wake it. However when the flic pending state is lost, s390_cpu_has_int
returns false, so s390_cpu_exec_interrupt falls through to halting
again.
Fix this by migrating pending. This prevents the qtest from hanging.
Does service_param need to be migrated? Or the IO lists?
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/intc/s390_flic.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6771645699..b70cf2295a 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
.fields = (const VMStateField[]) {
VMSTATE_UINT8(simm, QEMUS390FLICState),
VMSTATE_UINT8(nimm, QEMUS390FLICState),
+ VMSTATE_UINT32(pending, QEMUS390FLICState),
VMSTATE_END_OF_LIST()
}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
2024-05-27 5:46 ` Prasad Pandit
2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
2024-05-27 7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth
3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
To: qemu-s390x
Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel
s390x is more stable now. Enable it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/qtest/migration-test.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 94d5057857..7987faaded 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
migration_test_add("/migration/analyze-script", test_analyze_script);
#endif
- /*
- * On s390x, the test seems to be touchy with TCG, perhaps due to race
- * conditions on dirty bits, so disable it there until the problems are
- * resolved.
- */
- if (g_str_equal(arch, "s390x") && !has_kvm) {
- g_test_message("Skipping tests: s390x host with KVM is required");
- goto test_add_done;
- }
-
if (is_x86) {
migration_test_add("/migration/precopy/unix/suspend/live",
test_precopy_unix_suspend_live);
@@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
test_vcpu_dirty_limit);
}
-test_add_done:
-
ret = g_test_run();
g_assert_cmpint(ret, ==, 0);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
@ 2024-05-25 13:12 ` Nicholas Piggin
2024-05-27 12:42 ` Fabiano Rosas
2024-05-27 7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth
3 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-25 13:12 UTC (permalink / raw)
To: qemu-s390x
Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel, Yury Kotov,
Dr . David Alan Gilbert
This was said to be broken on aarch64, but if it works on others,
let's try enable it. It's already starting to bitrot...
Cc: Yury Kotov <yury-kotov@yandex-team.ru>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/qtest/migration-test.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7987faaded..2bcdc33b7c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void)
#endif /* CONFIG_TASN1 */
#endif /* CONFIG_GNUTLS */
-#if 0
-/* Currently upset on aarch64 TCG */
static void test_ignore_shared(void)
{
g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
QTestState *from, *to;
+ MigrateStart args = {
+ .use_shmem = true,
+ };
- if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) {
+ if (test_migrate_start(&from, &to, uri, &args)) {
return;
}
@@ -1898,7 +1899,6 @@ static void test_ignore_shared(void)
test_migrate_end(from, to, true);
}
-#endif
static void *
test_migrate_xbzrle_start(QTestState *from,
@@ -3537,7 +3537,10 @@ int main(int argc, char **argv)
#endif /* CONFIG_TASN1 */
#endif /* CONFIG_GNUTLS */
- /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
+ if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */
+ migration_test_add("/migration/ignore_shared", test_ignore_shared);
+ }
+
#ifndef _WIN32
migration_test_add("/migration/precopy/fd/tcp",
test_migrate_precopy_fd_socket);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
@ 2024-05-26 15:53 ` David Hildenbrand
2024-05-26 19:44 ` Richard Henderson
0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-05-26 15:53 UTC (permalink / raw)
To: Nicholas Piggin, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel
Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> The flic pending state is not migrated, so if the machine is migrated
> while an interrupt is pending, it can be lost. This shows up in
> qtest migration test, an extint is pending (due to console writes?)
> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> wake it. However when the flic pending state is lost, s390_cpu_has_int
> returns false, so s390_cpu_exec_interrupt falls through to halting
> again.
>
> Fix this by migrating pending. This prevents the qtest from hanging.
> Does service_param need to be migrated? Or the IO lists?
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> hw/intc/s390_flic.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> index 6771645699..b70cf2295a 100644
> --- a/hw/intc/s390_flic.c
> +++ b/hw/intc/s390_flic.c
> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
> .fields = (const VMStateField[]) {
> VMSTATE_UINT8(simm, QEMUS390FLICState),
> VMSTATE_UINT8(nimm, QEMUS390FLICState),
> + VMSTATE_UINT32(pending, QEMUS390FLICState),
> VMSTATE_END_OF_LIST()
> }
> };
Likely you have to handle this using QEMU compat machines.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-26 15:53 ` David Hildenbrand
@ 2024-05-26 19:44 ` Richard Henderson
2024-05-26 20:33 ` David Hildenbrand
0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2024-05-26 19:44 UTC (permalink / raw)
To: David Hildenbrand, Nicholas Piggin, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel
On 5/26/24 08:53, David Hildenbrand wrote:
> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>> The flic pending state is not migrated, so if the machine is migrated
>> while an interrupt is pending, it can be lost. This shows up in
>> qtest migration test, an extint is pending (due to console writes?)
>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>> returns false, so s390_cpu_exec_interrupt falls through to halting
>> again.
>>
>> Fix this by migrating pending. This prevents the qtest from hanging.
>> Does service_param need to be migrated? Or the IO lists?
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> hw/intc/s390_flic.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>> index 6771645699..b70cf2295a 100644
>> --- a/hw/intc/s390_flic.c
>> +++ b/hw/intc/s390_flic.c
>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>> .fields = (const VMStateField[]) {
>> VMSTATE_UINT8(simm, QEMUS390FLICState),
>> VMSTATE_UINT8(nimm, QEMUS390FLICState),
>> + VMSTATE_UINT32(pending, QEMUS390FLICState),
>> VMSTATE_END_OF_LIST()
>> }
>> };
>
> Likely you have to handle this using QEMU compat machines.
Well, since existing migration is broken, I don't think you have to preserve
compatibility. But you do have to bump the version number.
r~
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-26 19:44 ` Richard Henderson
@ 2024-05-26 20:33 ` David Hildenbrand
2024-05-27 5:51 ` Thomas Huth
0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2024-05-26 20:33 UTC (permalink / raw)
To: Richard Henderson, Nicholas Piggin, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, Thomas Huth, Peter Xu,
Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel
Am 26.05.24 um 21:44 schrieb Richard Henderson:
> On 5/26/24 08:53, David Hildenbrand wrote:
>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>> The flic pending state is not migrated, so if the machine is migrated
>>> while an interrupt is pending, it can be lost. This shows up in
>>> qtest migration test, an extint is pending (due to console writes?)
>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>> again.
>>>
>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>> Does service_param need to be migrated? Or the IO lists?
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> hw/intc/s390_flic.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>> index 6771645699..b70cf2295a 100644
>>> --- a/hw/intc/s390_flic.c
>>> +++ b/hw/intc/s390_flic.c
>>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>> .fields = (const VMStateField[]) {
>>> VMSTATE_UINT8(simm, QEMUS390FLICState),
>>> VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>> + VMSTATE_UINT32(pending, QEMUS390FLICState),
>>> VMSTATE_END_OF_LIST()
>>> }
>>> };
>>
>> Likely you have to handle this using QEMU compat machines.
>
> Well, since existing migration is broken, I don't think you have to preserve
Migration is broken only in some case "while an interrupt is pending, it can be
lost".
> compatibility. But you do have to bump the version number.
Looking at it, this is TCG only, so likely we don't care that much about
migration compatibility. But I have no idea what level of compatibility we want
to support there.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
@ 2024-05-27 5:46 ` Prasad Pandit
2024-05-27 7:04 ` Nicholas Piggin
0 siblings, 1 reply; 17+ messages in thread
From: Prasad Pandit @ 2024-05-27 5:46 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
Paolo Bonzini, qemu-devel
Hi,
On Sat, 25 May 2024 at 18:44, Nicholas Piggin <npiggin@gmail.com> wrote:
> s390x is more stable now. Enable it.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/qtest/migration-test.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 94d5057857..7987faaded 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
> migration_test_add("/migration/analyze-script", test_analyze_script);
> #endif
>
> - /*
> - * On s390x, the test seems to be touchy with TCG, perhaps due to race
> - * conditions on dirty bits, so disable it there until the problems are
> - * resolved.
> - */
-> https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg04774.html
* Above patch (not reviewed yet) adds comment about sporadic problems
on s390x, and this patch says s390x is stable now? It'll help to
mention in the commit log - what changed to make it stable in 1 day.
* IIUC, this and the ppc64 patch above enable 'migration-test' for
s390x and ppc64 platforms, when KVM is not available for them? ie.
When running s390x & ppc64 migration-tests with TCG on non s390x or
non-ppc64 machines, right? Maybe the commit message could say
something to the effect of - enable s390x and ppc64 'migration-test'
to run with TCG across platforms where KVM for s390x OR KVM for
ppc64 is not available.
> - if (g_str_equal(arch, "s390x") && !has_kvm) {
> - g_test_message("Skipping tests: s390x host with KVM is required");
> - goto test_add_done;
> - }
> -
> if (is_x86) {
> migration_test_add("/migration/precopy/unix/suspend/live",
> test_precopy_unix_suspend_live);
> @@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
> test_vcpu_dirty_limit);
> }
>
> -test_add_done:
> -
> ret = g_test_run();
>
> g_assert_cmpint(ret, ==, 0);
> --
* Otherwise, the change looks okay to enable 'migration-test' for
s390x with TCG IIUC.
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-26 20:33 ` David Hildenbrand
@ 2024-05-27 5:51 ` Thomas Huth
2024-05-27 7:11 ` Nicholas Piggin
0 siblings, 1 reply; 17+ messages in thread
From: Thomas Huth @ 2024-05-27 5:51 UTC (permalink / raw)
To: David Hildenbrand, Richard Henderson, Nicholas Piggin, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel
On 26/05/2024 22.33, David Hildenbrand wrote:
> Am 26.05.24 um 21:44 schrieb Richard Henderson:
>> On 5/26/24 08:53, David Hildenbrand wrote:
>>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>>> The flic pending state is not migrated, so if the machine is migrated
>>>> while an interrupt is pending, it can be lost. This shows up in
>>>> qtest migration test, an extint is pending (due to console writes?)
>>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>>> again.
>>>>
>>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>>> Does service_param need to be migrated? Or the IO lists?
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>> hw/intc/s390_flic.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>>> index 6771645699..b70cf2295a 100644
>>>> --- a/hw/intc/s390_flic.c
>>>> +++ b/hw/intc/s390_flic.c
>>>> @@ -369,6 +369,7 @@ static const VMStateDescription
>>>> qemu_s390_flic_vmstate = {
>>>> .fields = (const VMStateField[]) {
>>>> VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>> VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>>> + VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>> VMSTATE_END_OF_LIST()
>>>> }
>>>> };
>>>
>>> Likely you have to handle this using QEMU compat machines.
>>
>> Well, since existing migration is broken, I don't think you have to preserve
>
> Migration is broken only in some case "while an interrupt is pending, it can
> be lost".
>
>> compatibility. But you do have to bump the version number.
>
> Looking at it, this is TCG only, so likely we don't care that much about
> migration compatibility. But I have no idea what level of compatibility we
> want to support there.
Yes, this seems to only affect the TCG-only flic device, where migration has
never been working very reliably. So I think we don't really need the whole
compat-machine dance here. But I think we should at least bump the
version_id to 2 now and then use
VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);
for the new field. That way we would at least support forward migrations
without too much hassle.
Thomas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
2024-05-27 5:46 ` Prasad Pandit
@ 2024-05-27 7:04 ` Nicholas Piggin
2024-05-27 7:40 ` Prasad Pandit
0 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-27 7:04 UTC (permalink / raw)
To: Prasad Pandit
Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
Paolo Bonzini, qemu-devel
On Mon May 27, 2024 at 3:46 PM AEST, Prasad Pandit wrote:
> Hi,
>
> On Sat, 25 May 2024 at 18:44, Nicholas Piggin <npiggin@gmail.com> wrote:
> > s390x is more stable now. Enable it.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > tests/qtest/migration-test.c | 12 ------------
> > 1 file changed, 12 deletions(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 94d5057857..7987faaded 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -3428,16 +3428,6 @@ int main(int argc, char **argv)
> > migration_test_add("/migration/analyze-script", test_analyze_script);
> > #endif
> >
> > - /*
> > - * On s390x, the test seems to be touchy with TCG, perhaps due to race
> > - * conditions on dirty bits, so disable it there until the problems are
> > - * resolved.
> > - */
>
> -> https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg04774.html
>
> * Above patch (not reviewed yet) adds comment about sporadic problems
> on s390x, and this patch says s390x is stable now? It'll help to
> mention in the commit log - what changed to make it stable in 1 day.
Patch 1 of this series.
> * IIUC, this and the ppc64 patch above enable 'migration-test' for
> s390x and ppc64 platforms, when KVM is not available for them? ie.
> When running s390x & ppc64 migration-tests with TCG on non s390x or
> non-ppc64 machines, right? Maybe the commit message could say
> something to the effect of - enable s390x and ppc64 'migration-test'
> to run with TCG across platforms where KVM for s390x OR KVM for
> ppc64 is not available.
Yes they should be called "enable for TCG" indeed.
> > - if (g_str_equal(arch, "s390x") && !has_kvm) {
> > - g_test_message("Skipping tests: s390x host with KVM is required");
> > - goto test_add_done;
> > - }
> > -
> > if (is_x86) {
> > migration_test_add("/migration/precopy/unix/suspend/live",
> > test_precopy_unix_suspend_live);
> > @@ -3619,8 +3609,6 @@ int main(int argc, char **argv)
> > test_vcpu_dirty_limit);
> > }
> >
> > -test_add_done:
> > -
> > ret = g_test_run();
> >
> > g_assert_cmpint(ret, ==, 0);
> > --
>
> * Otherwise, the change looks okay to enable 'migration-test' for
> s390x with TCG IIUC.
Thanks,
Nick
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
2024-05-27 5:51 ` Thomas Huth
@ 2024-05-27 7:11 ` Nicholas Piggin
0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2024-05-27 7:11 UTC (permalink / raw)
To: Thomas Huth, David Hildenbrand, Richard Henderson, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, Peter Xu, Fabiano Rosas,
Laurent Vivier, Paolo Bonzini, qemu-devel
On Mon May 27, 2024 at 3:51 PM AEST, Thomas Huth wrote:
> On 26/05/2024 22.33, David Hildenbrand wrote:
> > Am 26.05.24 um 21:44 schrieb Richard Henderson:
> >> On 5/26/24 08:53, David Hildenbrand wrote:
> >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> >>>> The flic pending state is not migrated, so if the machine is migrated
> >>>> while an interrupt is pending, it can be lost. This shows up in
> >>>> qtest migration test, an extint is pending (due to console writes?)
> >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
> >>>> returns false, so s390_cpu_exec_interrupt falls through to halting
> >>>> again.
> >>>>
> >>>> Fix this by migrating pending. This prevents the qtest from hanging.
> >>>> Does service_param need to be migrated? Or the IO lists?
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>> hw/intc/s390_flic.c | 1 +
> >>>> 1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> >>>> index 6771645699..b70cf2295a 100644
> >>>> --- a/hw/intc/s390_flic.c
> >>>> +++ b/hw/intc/s390_flic.c
> >>>> @@ -369,6 +369,7 @@ static const VMStateDescription
> >>>> qemu_s390_flic_vmstate = {
> >>>> .fields = (const VMStateField[]) {
> >>>> VMSTATE_UINT8(simm, QEMUS390FLICState),
> >>>> VMSTATE_UINT8(nimm, QEMUS390FLICState),
> >>>> + VMSTATE_UINT32(pending, QEMUS390FLICState),
> >>>> VMSTATE_END_OF_LIST()
> >>>> }
> >>>> };
> >>>
> >>> Likely you have to handle this using QEMU compat machines.
> >>
> >> Well, since existing migration is broken, I don't think you have to preserve
> >
> > Migration is broken only in some case "while an interrupt is pending, it can
> > be lost".
> >
> >> compatibility. But you do have to bump the version number.
> >
> > Looking at it, this is TCG only, so likely we don't care that much about
> > migration compatibility. But I have no idea what level of compatibility we
> > want to support there.
>
> Yes, this seems to only affect the TCG-only flic device, where migration has
> never been working very reliably. So I think we don't really need the whole
> compat-machine dance here. But I think we should at least bump the
> version_id to 2 now and then use
>
> VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);
>
> for the new field. That way we would at least support forward migrations
> without too much hassle.
Well if you could rebuild this state by checking sources or something
it might be possible to avoid. Or if you could always mark pending and
software can tolerate superflous. But that also seems like a lot of
headache for something that was always flaky.
Thanks,
Nick
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
` (2 preceding siblings ...)
2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
@ 2024-05-27 7:13 ` Thomas Huth
3 siblings, 0 replies; 17+ messages in thread
From: Thomas Huth @ 2024-05-27 7:13 UTC (permalink / raw)
To: Nicholas Piggin, qemu-s390x
Cc: Halil Pasic, Christian Borntraeger, David Hildenbrand, Peter Xu,
Fabiano Rosas, Laurent Vivier, Paolo Bonzini, qemu-devel
On 25/05/2024 15.12, Nicholas Piggin wrote:
> I don't know s390x enough to know if this is the right fix, but I
> could debug the migration hangs this far at least (and the patch
> fixes the condition that would previously result in a hang on the
> qtest).
I've run this now for a while, and the hang seems to be gone, indeed!
Pretty cool, thanks for digging into this!
Tested-by: Thomas Huth <thuth@redhat.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x
2024-05-27 7:04 ` Nicholas Piggin
@ 2024-05-27 7:40 ` Prasad Pandit
0 siblings, 0 replies; 17+ messages in thread
From: Prasad Pandit @ 2024-05-27 7:40 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-s390x, Halil Pasic, Christian Borntraeger, David Hildenbrand,
Thomas Huth, Peter Xu, Fabiano Rosas, Laurent Vivier,
Paolo Bonzini, qemu-devel
On Mon, 27 May 2024 at 12:34, Nicholas Piggin <npiggin@gmail.com> wrote:
> > * Above patch (not reviewed yet) adds comment about sporadic problems
> > on s390x, and this patch says s390x is stable now? It'll help to
> > mention in the commit log - what changed to make it stable in 1 day.
>
> Patch 1 of this series.
* Ah, that one got filtered to another folder. It'll help to add
something about the 'flic pending' case being specific to migration of
TCG device and so it's reasonable to remove 'has_kvm' check and enable
migration-test to run with TCG.
> Yes they should be called "enable for TCG" indeed.
>
Ack with commit message changes:
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
@ 2024-05-27 12:42 ` Fabiano Rosas
2024-05-27 14:56 ` Peter Xu
0 siblings, 1 reply; 17+ messages in thread
From: Fabiano Rosas @ 2024-05-27 12:42 UTC (permalink / raw)
To: Nicholas Piggin, qemu-s390x
Cc: Nicholas Piggin, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Peter Xu, Laurent Vivier,
Paolo Bonzini, qemu-devel, Yury Kotov, Dr . David Alan Gilbert
Nicholas Piggin <npiggin@gmail.com> writes:
> This was said to be broken on aarch64, but if it works on others,
> let's try enable it. It's already starting to bitrot...
Yeah, look at the state of this...
I don't know what the issue was on aarch64, but I'm all for enabling
this test globally and then we deal with the breakage if it ever
comes. I don't think it will.
However, there is an issue here still on all archs - which might very
well have been the original issue - which is the fact that the
containers on the Gitlab CI have limits on shared memory usage.
Unfortunately we cannot enable this test for the CI, so it needs a check
on the GITLAB_CI environment variable.
There's also the cpr-reboot test which got put under "flaky", that has
the same issue. That one should also have been under GITLAB_CI. From
that discussion:
"We have an issue with this test on CI:
$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 64M 0 64M 0% /dev/shm
These are shared CI runners, so AFAICT there's no way to increase the
shared memory size.
Reducing the memory for this single test also wouldn't work because we
can run migration-test for different archs in parallel + there's the
ivshmem_test which uses 4M.
Maybe just leave it out of CI? Laptops will probably have enough shared
memory to not hit this. If we add a warning comment to the test, might
be enough." -- https://lore.kernel.org/all/87ttq5fvh7.fsf@suse.de
>
> Cc: Yury Kotov <yury-kotov@yandex-team.ru>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/qtest/migration-test.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7987faaded..2bcdc33b7c 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void)
> #endif /* CONFIG_TASN1 */
> #endif /* CONFIG_GNUTLS */
>
> -#if 0
> -/* Currently upset on aarch64 TCG */
> static void test_ignore_shared(void)
> {
> g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> QTestState *from, *to;
> + MigrateStart args = {
> + .use_shmem = true,
> + };
>
> - if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) {
> + if (test_migrate_start(&from, &to, uri, &args)) {
> return;
> }
>
> @@ -1898,7 +1899,6 @@ static void test_ignore_shared(void)
>
> test_migrate_end(from, to, true);
> }
> -#endif
>
> static void *
> test_migrate_xbzrle_start(QTestState *from,
> @@ -3537,7 +3537,10 @@ int main(int argc, char **argv)
> #endif /* CONFIG_TASN1 */
> #endif /* CONFIG_GNUTLS */
>
> - /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
> + if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */
> + migration_test_add("/migration/ignore_shared", test_ignore_shared);
> + }
> +
> #ifndef _WIN32
> migration_test_add("/migration/precopy/fd/tcp",
> test_migrate_precopy_fd_socket);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
2024-05-27 12:42 ` Fabiano Rosas
@ 2024-05-27 14:56 ` Peter Xu
2024-05-27 15:11 ` Fabiano Rosas
0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2024-05-27 14:56 UTC (permalink / raw)
To: Fabiano Rosas
Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
qemu-devel, Yury Kotov, Dr . David Alan Gilbert
On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
> However, there is an issue here still on all archs - which might very
> well have been the original issue - which is the fact that the
> containers on the Gitlab CI have limits on shared memory usage.
> Unfortunately we cannot enable this test for the CI, so it needs a check
> on the GITLAB_CI environment variable.
Another option is we teach migration-test to detect whether memory_size of
shmem is available, skip if not. It can be a sequence of:
memfd_create()
fallocate()
ret = madvise(MADV_POPULATE_WRITE)
To be run at the entry of migration-test, and skip all use_shmem=true tests
if ret != 0, or any step failed above.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
2024-05-27 14:56 ` Peter Xu
@ 2024-05-27 15:11 ` Fabiano Rosas
2024-05-27 15:41 ` Peter Xu
0 siblings, 1 reply; 17+ messages in thread
From: Fabiano Rosas @ 2024-05-27 15:11 UTC (permalink / raw)
To: Peter Xu
Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
qemu-devel, Yury Kotov, Dr . David Alan Gilbert
Peter Xu <peterx@redhat.com> writes:
> On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
>> However, there is an issue here still on all archs - which might very
>> well have been the original issue - which is the fact that the
>> containers on the Gitlab CI have limits on shared memory usage.
>> Unfortunately we cannot enable this test for the CI, so it needs a check
>> on the GITLAB_CI environment variable.
>
> Another option is we teach migration-test to detect whether memory_size of
> shmem is available, skip if not. It can be a sequence of:
>
> memfd_create()
> fallocate()
> ret = madvise(MADV_POPULATE_WRITE)
>
> To be run at the entry of migration-test, and skip all use_shmem=true tests
> if ret != 0, or any step failed above.
There are actually two issues:
1) Trying to run a test that needs more shmem than available in the
container. This is covered well by your suggestion.
2) Trying to use some shmem while another test has already consumed all
shmem. I'm not sure if this can be done reliably as the tests run in
parallel.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared
2024-05-27 15:11 ` Fabiano Rosas
@ 2024-05-27 15:41 ` Peter Xu
0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2024-05-27 15:41 UTC (permalink / raw)
To: Fabiano Rosas
Cc: Nicholas Piggin, qemu-s390x, Halil Pasic, Christian Borntraeger,
David Hildenbrand, Thomas Huth, Laurent Vivier, Paolo Bonzini,
qemu-devel, Yury Kotov, Dr . David Alan Gilbert
On Mon, May 27, 2024 at 12:11:45PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
> >> However, there is an issue here still on all archs - which might very
> >> well have been the original issue - which is the fact that the
> >> containers on the Gitlab CI have limits on shared memory usage.
> >> Unfortunately we cannot enable this test for the CI, so it needs a check
> >> on the GITLAB_CI environment variable.
> >
> > Another option is we teach migration-test to detect whether memory_size of
> > shmem is available, skip if not. It can be a sequence of:
> >
> > memfd_create()
> > fallocate()
> > ret = madvise(MADV_POPULATE_WRITE)
> >
> > To be run at the entry of migration-test, and skip all use_shmem=true tests
> > if ret != 0, or any step failed above.
>
> There are actually two issues:
>
> 1) Trying to run a test that needs more shmem than available in the
> container. This is covered well by your suggestion.
>
> 2) Trying to use some shmem while another test has already consumed all
> shmem. I'm not sure if this can be done reliably as the tests run in
> parallel.
Maybe we can also make that check to be per-test, then when use_shmem=true
the test populates the shmem file before using, skip if population fails.
And if it succeeded, using that file in that test should be reliable.
--
Peter Xu
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-05-27 15:41 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-25 13:12 [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state Nicholas Piggin
2024-05-26 15:53 ` David Hildenbrand
2024-05-26 19:44 ` Richard Henderson
2024-05-26 20:33 ` David Hildenbrand
2024-05-27 5:51 ` Thomas Huth
2024-05-27 7:11 ` Nicholas Piggin
2024-05-25 13:12 ` [RFC PATCH 2/3] tests/qtest/migration-test: enable on s390x Nicholas Piggin
2024-05-27 5:46 ` Prasad Pandit
2024-05-27 7:04 ` Nicholas Piggin
2024-05-27 7:40 ` Prasad Pandit
2024-05-25 13:12 ` [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared Nicholas Piggin
2024-05-27 12:42 ` Fabiano Rosas
2024-05-27 14:56 ` Peter Xu
2024-05-27 15:11 ` Fabiano Rosas
2024-05-27 15:41 ` Peter Xu
2024-05-27 7:13 ` [RFC PATCH 0/3] Fix s390x flic migration and add some more qtests Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).