* [PATCH] tests: increase timeout per instance of bios-tables-test
@ 2024-07-16 12:59 Igor Mammedov
2024-07-16 13:06 ` Michael S. Tsirkin
2024-07-22 7:35 ` Thomas Huth
0 siblings, 2 replies; 9+ messages in thread
From: Igor Mammedov @ 2024-07-16 12:59 UTC (permalink / raw)
To: qemu-devel; +Cc: thuth, lvivier, pbonzini, mst
CI often fails 'cross-i686-tci' job due to runner slowness
Log shows that test almost complete, with a few remaining
when bios-tables-test timeout hits:
19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
TIMEOUT 610.02s killed by signal 15 SIGTERM
...
stderr:
TAP parsing error: Too few tests run (expected 8, got 7)
At the same time overall job running time is only ~30 out of 1hr allowed.
Increase bios-tables-test instance timeout on 5min as a fix
for slow CI runners.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
tests/qtest/meson.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6508bfb1a2..ff9200f882 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,6 +1,6 @@
slow_qtests = {
'aspeed_smc-test': 360,
- 'bios-tables-test' : 610,
+ 'bios-tables-test' : 910,
'cdrom-test' : 610,
'device-introspect-test' : 720,
'migration-test' : 480,
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 12:59 [PATCH] tests: increase timeout per instance of bios-tables-test Igor Mammedov
@ 2024-07-16 13:06 ` Michael S. Tsirkin
2024-07-16 13:41 ` Peter Maydell
` (2 more replies)
2024-07-22 7:35 ` Thomas Huth
1 sibling, 3 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2024-07-16 13:06 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, thuth, lvivier, pbonzini
On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> CI often fails 'cross-i686-tci' job due to runner slowness
> Log shows that test almost complete, with a few remaining
> when bios-tables-test timeout hits:
>
> 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> TIMEOUT 610.02s killed by signal 15 SIGTERM
> ...
> stderr:
> TAP parsing error: Too few tests run (expected 8, got 7)
>
> At the same time overall job running time is only ~30 out of 1hr allowed.
>
> Increase bios-tables-test instance timeout on 5min as a fix
> for slow CI runners.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
We can't just keep increasing the timeout.
The issue is checking wall time on a busy host,
isn't it? Let's check CPU time instead.
> ---
> tests/qtest/meson.build | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 6508bfb1a2..ff9200f882 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,6 +1,6 @@
> slow_qtests = {
> 'aspeed_smc-test': 360,
> - 'bios-tables-test' : 610,
> + 'bios-tables-test' : 910,
> 'cdrom-test' : 610,
> 'device-introspect-test' : 720,
> 'migration-test' : 480,
> --
> 2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 13:06 ` Michael S. Tsirkin
@ 2024-07-16 13:41 ` Peter Maydell
2024-07-16 14:48 ` Igor Mammedov
2024-07-16 17:44 ` Thomas Huth
2 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2024-07-16 13:41 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Igor Mammedov, qemu-devel, thuth, lvivier, pbonzini
On Tue, 16 Jul 2024 at 14:07, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> >
> > 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> > TIMEOUT 610.02s killed by signal 15 SIGTERM
> > ...
> > stderr:
> > TAP parsing error: Too few tests run (expected 8, got 7)
> >
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> >
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> We can't just keep increasing the timeout.
> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.
The last time this came up Stefan had a look and reckoned
that the problem was that we try to run the qtests in
parallel when the CI host is only giving us one CPU:
https://lore.kernel.org/qemu-devel/9692cfcb-ef59-4cec-8452-8bfb859e8a6c@weilnetz.de/
thanks
-- PMM
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 13:06 ` Michael S. Tsirkin
2024-07-16 13:41 ` Peter Maydell
@ 2024-07-16 14:48 ` Igor Mammedov
2024-07-16 17:44 ` Thomas Huth
2 siblings, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2024-07-16 14:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, thuth, lvivier, pbonzini, Philippe Mathieu-Daudé,
Peter Maydell
On Tue, 16 Jul 2024 09:06:59 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> >
> > 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> > TIMEOUT 610.02s killed by signal 15 SIGTERM
> > ...
> > stderr:
> > TAP parsing error: Too few tests run (expected 8, got 7)
> >
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> >
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> We can't just keep increasing the timeout.
in this case I'm following precedent
https://gitlab.com/qemu-project/qemu/-/commit/a1f5a47b60d119859d974bed4d66db745448aac6
I'm not saying that the right approach (though seems to work for now)
> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.
It likely won't help as we still racing with wallclock
overall job timeout (which sometimes triggers failure too,
I guess it depends on stars alignment and load on the host).
Anyways, I don't have know-how when it comes to meson,
to do more than this patch.
with this patch 'cross-i686-tci' job passes for me,
but we have msys2-64bit failing atm due timeouts as well
(seems to be limited to sparc tests)
>
> > ---
> > tests/qtest/meson.build | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > index 6508bfb1a2..ff9200f882 100644
> > --- a/tests/qtest/meson.build
> > +++ b/tests/qtest/meson.build
> > @@ -1,6 +1,6 @@
> > slow_qtests = {
> > 'aspeed_smc-test': 360,
> > - 'bios-tables-test' : 610,
> > + 'bios-tables-test' : 910,
> > 'cdrom-test' : 610,
> > 'device-introspect-test' : 720,
> > 'migration-test' : 480,
> > --
> > 2.43.0
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 13:06 ` Michael S. Tsirkin
2024-07-16 13:41 ` Peter Maydell
2024-07-16 14:48 ` Igor Mammedov
@ 2024-07-16 17:44 ` Thomas Huth
2024-07-16 19:52 ` Peter Maydell
2 siblings, 1 reply; 9+ messages in thread
From: Thomas Huth @ 2024-07-16 17:44 UTC (permalink / raw)
To: Michael S. Tsirkin, Igor Mammedov; +Cc: qemu-devel, lvivier, pbonzini
On 16/07/2024 15.06, Michael S. Tsirkin wrote:
> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
>> CI often fails 'cross-i686-tci' job due to runner slowness
>> Log shows that test almost complete, with a few remaining
>> when bios-tables-test timeout hits:
>>
>> 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
>> TIMEOUT 610.02s killed by signal 15 SIGTERM
>> ...
>> stderr:
>> TAP parsing error: Too few tests run (expected 8, got 7)
>>
>> At the same time overall job running time is only ~30 out of 1hr allowed.
>>
>> Increase bios-tables-test instance timeout on 5min as a fix
>> for slow CI runners.
>>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> We can't just keep increasing the timeout.
> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.
The timeout setting comes from meson, not sure whether you can switch that
easily to use CPU time instead of wall time?
Anyway, if the bios-tables-test is getting more and more complex, it's maybe
not such a good idea to run it in a job that is using TCI ... Maybe it's
best to remove aarch64-softmmu from the cross-i686-tci job?
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 17:44 ` Thomas Huth
@ 2024-07-16 19:52 ` Peter Maydell
0 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2024-07-16 19:52 UTC (permalink / raw)
To: Thomas Huth
Cc: Michael S. Tsirkin, Igor Mammedov, qemu-devel, lvivier, pbonzini
On Tue, 16 Jul 2024 at 18:45, Thomas Huth <thuth@redhat.com> wrote:
>
> On 16/07/2024 15.06, Michael S. Tsirkin wrote:
> > On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> >> CI often fails 'cross-i686-tci' job due to runner slowness
> >> Log shows that test almost complete, with a few remaining
> >> when bios-tables-test timeout hits:
> >>
> >> 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> >> TIMEOUT 610.02s killed by signal 15 SIGTERM
> >> ...
> >> stderr:
> >> TAP parsing error: Too few tests run (expected 8, got 7)
> >>
> >> At the same time overall job running time is only ~30 out of 1hr allowed.
> >>
> >> Increase bios-tables-test instance timeout on 5min as a fix
> >> for slow CI runners.
> >>
> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >
> > We can't just keep increasing the timeout.
> > The issue is checking wall time on a busy host,
> > isn't it? Let's check CPU time instead.
>
> The timeout setting comes from meson, not sure whether you can switch that
> easily to use CPU time instead of wall time?
>
> Anyway, if the bios-tables-test is getting more and more complex, it's maybe
> not such a good idea to run it in a job that is using TCI ... Maybe it's
> best to remove aarch64-softmmu from the cross-i686-tci job?
It's one of the few tests that actually runs code in the guest:
we definitely shouldn't reduce the coverage of the actual TCI
part of the TCI job, I think.
I continue to think we need to find out why this CI job is
perpetually flaky and fix the underlying cause, not simply
increase timeouts or drop test cases or configs from it.
-- PMM
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-16 12:59 [PATCH] tests: increase timeout per instance of bios-tables-test Igor Mammedov
2024-07-16 13:06 ` Michael S. Tsirkin
@ 2024-07-22 7:35 ` Thomas Huth
2024-07-22 7:47 ` Michael S. Tsirkin
2024-07-22 9:50 ` Igor Mammedov
1 sibling, 2 replies; 9+ messages in thread
From: Thomas Huth @ 2024-07-22 7:35 UTC (permalink / raw)
To: Igor Mammedov, qemu-devel; +Cc: lvivier, pbonzini, mst
On 16/07/2024 14.59, Igor Mammedov wrote:
> CI often fails 'cross-i686-tci' job due to runner slowness
> Log shows that test almost complete, with a few remaining
> when bios-tables-test timeout hits:
>
> 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> TIMEOUT 610.02s killed by signal 15 SIGTERM
> ...
> stderr:
> TAP parsing error: Too few tests run (expected 8, got 7)
>
> At the same time overall job running time is only ~30 out of 1hr allowed.
>
> Increase bios-tables-test instance timeout on 5min as a fix
> for slow CI runners.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> tests/qtest/meson.build | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Since we're entering the freeze period this week, I'm going to pick up this
patch for my next pull request in the hope that it will help to get this job
green again during the freeze period. But in the long run, it would be
really good if someone familiar with the bios-tables-test could analyze why
the run time increased so much in recent times for this test and provide a
better fix for the problem.
Thanks,
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-22 7:35 ` Thomas Huth
@ 2024-07-22 7:47 ` Michael S. Tsirkin
2024-07-22 9:50 ` Igor Mammedov
1 sibling, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2024-07-22 7:47 UTC (permalink / raw)
To: Thomas Huth; +Cc: Igor Mammedov, qemu-devel, lvivier, pbonzini
On Mon, Jul 22, 2024 at 09:35:17AM +0200, Thomas Huth wrote:
> On 16/07/2024 14.59, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> >
> > 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> > TIMEOUT 610.02s killed by signal 15 SIGTERM
> > ...
> > stderr:
> > TAP parsing error: Too few tests run (expected 8, got 7)
> >
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> >
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > tests/qtest/meson.build | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Since we're entering the freeze period this week, I'm going to pick up this
> patch for my next pull request in the hope that it will help to get this job
> green again during the freeze period. But in the long run, it would be
> really good if someone familiar with the bios-tables-test could analyze why
> the run time increased so much in recent times for this test and provide a
> better fix for the problem.
>
> Thanks,
> Thomas
>
OK.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tests: increase timeout per instance of bios-tables-test
2024-07-22 7:35 ` Thomas Huth
2024-07-22 7:47 ` Michael S. Tsirkin
@ 2024-07-22 9:50 ` Igor Mammedov
1 sibling, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2024-07-22 9:50 UTC (permalink / raw)
To: Thomas Huth; +Cc: qemu-devel, lvivier, pbonzini, mst, Peter Maydell
On Mon, 22 Jul 2024 09:35:17 +0200
Thomas Huth <thuth@redhat.com> wrote:
> On 16/07/2024 14.59, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> >
> > 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> > TIMEOUT 610.02s killed by signal 15 SIGTERM
> > ...
> > stderr:
> > TAP parsing error: Too few tests run (expected 8, got 7)
> >
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> >
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > tests/qtest/meson.build | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Since we're entering the freeze period this week, I'm going to pick up this
> patch for my next pull request in the hope that it will help to get this job
> green again during the freeze period. But in the long run, it would be
> really good if someone familiar with the bios-tables-test could analyze why
> the run time increased so much in recent times for this test and provide a
> better fix for the problem.
running UEFI guest under TCG was always slow (more so for aarch64),
but we keep adding more sub-cases to bios-tables-test (each running
firmware), and that takes extra time.
Overall time to run bios-tables-test naturally increases,
so we have to increase timeout eventually regardless of everything else.
(however there has not been new tests added since 9.0)
In cases I've looked at, meson timed out at the end of bios-tables-test,
where almost all cases were executed modulo the last.
Given flakiness of failure and changes to sub-cases, a few things
could lead to it:
* performance regression when executing aarch64 tests, not much but enough
to push the last sub-test behind 10min boundary more often than not
(either qemu or firmware) (maybe running a few 9.0 jobs and comparing
them to the current master would show if guest became slower)
* CI job runners became slower (we can't control that if I'm not wrong).
As was suggested in another thread running less test in parallel
might help. But I won't bet on it, since it the end (by the time close
to timeout) only a few test are still running so it might be not much
contention on CPU resources.
>
> Thanks,
> Thomas
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-22 9:51 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16 12:59 [PATCH] tests: increase timeout per instance of bios-tables-test Igor Mammedov
2024-07-16 13:06 ` Michael S. Tsirkin
2024-07-16 13:41 ` Peter Maydell
2024-07-16 14:48 ` Igor Mammedov
2024-07-16 17:44 ` Thomas Huth
2024-07-16 19:52 ` Peter Maydell
2024-07-22 7:35 ` Thomas Huth
2024-07-22 7:47 ` Michael S. Tsirkin
2024-07-22 9:50 ` Igor Mammedov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).