* [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
@ 2024-09-12 15:10 Peter Maydell
2024-09-12 16:48 ` Thomas Huth
2024-09-13 12:24 ` Peter Maydell
0 siblings, 2 replies; 7+ messages in thread
From: Peter Maydell @ 2024-09-12 15:10 UTC (permalink / raw)
To: qemu-devel; +Cc: Thomas Huth
The cross-i686-tci CI job is persistently flaky with various tests
hitting timeouts. One theory for why this is happening is that we're
running too many tests in parallel and so sometimes a test gets
starved of CPU and isn't able to complete within the timeout.
(The environment this CI job runs in seems to cause us to default
to a parallelism of 9 in the main CI.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
If this works we might be able to wind this up to -j2 or -j3,
and/or consider whether other CI jobs need something similar.
---
.gitlab-ci.d/crossbuilds.yml | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 459273f9da5..1e21d082aa4 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -62,7 +62,11 @@ cross-i686-tci:
IMAGE: debian-i686-cross
ACCEL: tcg-interpreter
EXTRA_CONFIGURE_OPTS: --target-list=i386-softmmu,i386-linux-user,aarch64-softmmu,aarch64-linux-user,ppc-softmmu,ppc-linux-user --disable-plugins --disable-kvm
- MAKE_CHECK_ARGS: check check-tcg
+ # Force tests to run in series, to see whether this
+ # reduces the flakiness of this CI job. The CI
+ # environment by default shows us 8 CPUs and so we
+ # would otherwise be using a parallelism of 9.
+ MAKE_CHECK_ARGS: check check-tcg -j1
cross-mipsel-system:
extends: .cross_system_build_job
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-12 15:10 [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci Peter Maydell
@ 2024-09-12 16:48 ` Thomas Huth
2024-09-13 12:24 ` Peter Maydell
1 sibling, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2024-09-12 16:48 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
On 12/09/2024 17.10, Peter Maydell wrote:
> The cross-i686-tci CI job is persistently flaky with various tests
> hitting timeouts. One theory for why this is happening is that we're
> running too many tests in parallel and so sometimes a test gets
> starved of CPU and isn't able to complete within the timeout.
>
> (The environment this CI job runs in seems to cause us to default
> to a parallelism of 9 in the main CI.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> If this works we might be able to wind this up to -j2 or -j3,
> and/or consider whether other CI jobs need something similar.
As a start, we could also try replacing the
JOBS=$(expr $(nproc) + 1)
with
JOBS=$(nproc)
in the buildtest-template.yml file...?
> ---
> .gitlab-ci.d/crossbuilds.yml | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
> index 459273f9da5..1e21d082aa4 100644
> --- a/.gitlab-ci.d/crossbuilds.yml
> +++ b/.gitlab-ci.d/crossbuilds.yml
> @@ -62,7 +62,11 @@ cross-i686-tci:
> IMAGE: debian-i686-cross
> ACCEL: tcg-interpreter
> EXTRA_CONFIGURE_OPTS: --target-list=i386-softmmu,i386-linux-user,aarch64-softmmu,aarch64-linux-user,ppc-softmmu,ppc-linux-user --disable-plugins --disable-kvm
> - MAKE_CHECK_ARGS: check check-tcg
> + # Force tests to run in series, to see whether this
> + # reduces the flakiness of this CI job. The CI
> + # environment by default shows us 8 CPUs and so we
> + # would otherwise be using a parallelism of 9.
> + MAKE_CHECK_ARGS: check check-tcg -j1
Reviewed-by: Thomas Huth <thuth@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-12 15:10 [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci Peter Maydell
2024-09-12 16:48 ` Thomas Huth
@ 2024-09-13 12:24 ` Peter Maydell
2024-09-13 13:31 ` Peter Maydell
1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2024-09-13 12:24 UTC (permalink / raw)
To: qemu-devel; +Cc: Thomas Huth
On Thu, 12 Sept 2024 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> The cross-i686-tci CI job is persistently flaky with various tests
> hitting timeouts. One theory for why this is happening is that we're
> running too many tests in parallel and so sometimes a test gets
> starved of CPU and isn't able to complete within the timeout.
>
> (The environment this CI job runs in seems to cause us to default
> to a parallelism of 9 in the main CI.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> If this works we might be able to wind this up to -j2 or -j3,
> and/or consider whether other CI jobs need something similar.
I gave this a try, but unfortunately the result seems to be
that the whole job times out:
https://gitlab.com/qemu-project/qemu/-/jobs/7818441897
Maybe we could try a compromise of -j3 or thereabouts...
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-13 12:24 ` Peter Maydell
@ 2024-09-13 13:31 ` Peter Maydell
2024-09-13 13:55 ` Thomas Huth
2024-09-13 14:05 ` Daniel P. Berrangé
0 siblings, 2 replies; 7+ messages in thread
From: Peter Maydell @ 2024-09-13 13:31 UTC (permalink / raw)
To: qemu-devel; +Cc: Thomas Huth
On Fri, 13 Sept 2024 at 13:24, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 12 Sept 2024 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > The cross-i686-tci CI job is persistently flaky with various tests
> > hitting timeouts. One theory for why this is happening is that we're
> > running too many tests in parallel and so sometimes a test gets
> > starved of CPU and isn't able to complete within the timeout.
> >
> > (The environment this CI job runs in seems to cause us to default
> > to a parallelism of 9 in the main CI.)
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > If this works we might be able to wind this up to -j2 or -j3,
> > and/or consider whether other CI jobs need something similar.
>
> I gave this a try, but unfortunately the result seems to be
> that the whole job times out:
> https://gitlab.com/qemu-project/qemu/-/jobs/7818441897
...but then this simple retry passed with a runtime of 47 mins:
https://gitlab.com/qemu-project/qemu/-/jobs/7819225200
I'm tempted to commit this as-is, and see whether it helps.
If it doesn't I can always back it off to -j2, and if it does
generate a lot of full-job-timeouts it's only me it's annoying.
Looking at the timed-out job it looks like it just took a lot
longer on the compile phase... (Though it's hard to say because
the fact we use "make all check-build" in our gitlab CI config
means gitlab treats this as all one step when it adds time
annotations, and you can't separate time-for-compile from
time-for-tests.)
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-13 13:31 ` Peter Maydell
@ 2024-09-13 13:55 ` Thomas Huth
2024-09-13 14:05 ` Daniel P. Berrangé
1 sibling, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2024-09-13 13:55 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
On 13/09/2024 15.31, Peter Maydell wrote:
> On Fri, 13 Sept 2024 at 13:24, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Thu, 12 Sept 2024 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>
>>> The cross-i686-tci CI job is persistently flaky with various tests
>>> hitting timeouts. One theory for why this is happening is that we're
>>> running too many tests in parallel and so sometimes a test gets
>>> starved of CPU and isn't able to complete within the timeout.
>>>
>>> (The environment this CI job runs in seems to cause us to default
>>> to a parallelism of 9 in the main CI.)
>>>
>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>> ---
>>> If this works we might be able to wind this up to -j2 or -j3,
>>> and/or consider whether other CI jobs need something similar.
>>
>> I gave this a try, but unfortunately the result seems to be
>> that the whole job times out:
>> https://gitlab.com/qemu-project/qemu/-/jobs/7818441897
>
> ...but then this simple retry passed with a runtime of 47 mins:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7819225200
>
> I'm tempted to commit this as-is, and see whether it helps.
FWIW, I just had a try with your patch, too, and it took 53 minutes:
https://gitlab.com/thuth/qemu/-/jobs/7818945368
Older jobs without your patch seem to take ~ 25 to ~ 30 minutes instead, so
the runtime got definitely much worse by the -j1.
Considering that we're close to the 60 minutes timeout, you might need to
bump the timeout of the job to 70 or 75 minutes now, to be on the safe side?
Or maybe really try -j2 first?
Thomas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-13 13:31 ` Peter Maydell
2024-09-13 13:55 ` Thomas Huth
@ 2024-09-13 14:05 ` Daniel P. Berrangé
2024-09-13 14:23 ` Peter Maydell
1 sibling, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2024-09-13 14:05 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Thomas Huth
On Fri, Sep 13, 2024 at 02:31:34PM +0100, Peter Maydell wrote:
> On Fri, 13 Sept 2024 at 13:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Thu, 12 Sept 2024 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> > >
> > > The cross-i686-tci CI job is persistently flaky with various tests
> > > hitting timeouts. One theory for why this is happening is that we're
> > > running too many tests in parallel and so sometimes a test gets
> > > starved of CPU and isn't able to complete within the timeout.
> > >
> > > (The environment this CI job runs in seems to cause us to default
> > > to a parallelism of 9 in the main CI.)
> > >
> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > ---
> > > If this works we might be able to wind this up to -j2 or -j3,
> > > and/or consider whether other CI jobs need something similar.
> >
> > I gave this a try, but unfortunately the result seems to be
> > that the whole job times out:
> > https://gitlab.com/qemu-project/qemu/-/jobs/7818441897
>
> ...but then this simple retry passed with a runtime of 47 mins:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7819225200
>
> I'm tempted to commit this as-is, and see whether it helps.
> If it doesn't I can always back it off to -j2, and if it does
> generate a lot of full-job-timeouts it's only me it's annoying.
Anyone know how many vCPUs our k8s runners have ?
The gitlab runners that contributor forks use will have 2
vCPUs. So our current make -j$(nproc+1) will be effectively
-j3 already in pipelines for forks. IOW, we intentionally
slightly over-commit CPUs right now. Backing off to just
-j$(nproc) may be better than hardcoding -j1/-j2, so that
it takes account of different runner sizes ?
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci
2024-09-13 14:05 ` Daniel P. Berrangé
@ 2024-09-13 14:23 ` Peter Maydell
0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2024-09-13 14:23 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: qemu-devel, Thomas Huth
On Fri, 13 Sept 2024 at 15:05, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Sep 13, 2024 at 02:31:34PM +0100, Peter Maydell wrote:
> > On Fri, 13 Sept 2024 at 13:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> > >
> > > On Thu, 12 Sept 2024 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > >
> > > > The cross-i686-tci CI job is persistently flaky with various tests
> > > > hitting timeouts. One theory for why this is happening is that we're
> > > > running too many tests in parallel and so sometimes a test gets
> > > > starved of CPU and isn't able to complete within the timeout.
> > > >
> > > > (The environment this CI job runs in seems to cause us to default
> > > > to a parallelism of 9 in the main CI.)
> > > >
> > > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > > > ---
> > > > If this works we might be able to wind this up to -j2 or -j3,
> > > > and/or consider whether other CI jobs need something similar.
> > >
> > > I gave this a try, but unfortunately the result seems to be
> > > that the whole job times out:
> > > https://gitlab.com/qemu-project/qemu/-/jobs/7818441897
> >
> > ...but then this simple retry passed with a runtime of 47 mins:
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/7819225200
> >
> > I'm tempted to commit this as-is, and see whether it helps.
> > If it doesn't I can always back it off to -j2, and if it does
> > generate a lot of full-job-timeouts it's only me it's annoying.
>
> Anyone know how many vCPUs our k8s runners have ?
They report as 8, I think, given that in the main CI run this
job gets run as -j9. But we clearly aren't actually getting
a reliable 9 CPUs worth.
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-13 14:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-12 15:10 [PATCH v2] .gitlab-ci.d/crossbuilds.yml: Force 'make check' single-threaded for cross-i686-tci Peter Maydell
2024-09-12 16:48 ` Thomas Huth
2024-09-13 12:24 ` Peter Maydell
2024-09-13 13:31 ` Peter Maydell
2024-09-13 13:55 ` Thomas Huth
2024-09-13 14:05 ` Daniel P. Berrangé
2024-09-13 14:23 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).