* stable-rc-ci tests failing
@ 2023-08-02 8:56 Pavel Machek
2023-08-02 9:17 ` Chris Paterson
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
0 siblings, 2 replies; 10+ messages in thread
From: Pavel Machek @ 2023-08-02 8:56 UTC (permalink / raw)
To: chris.paterson2, cip-dev
[-- Attachment #1: Type: text/plain, Size: 336 bytes --]
Hi!
There seems to be something wrong with testing at the moment:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/952990993
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: stable-rc-ci tests failing
2023-08-02 8:56 stable-rc-ci tests failing Pavel Machek
@ 2023-08-02 9:17 ` Chris Paterson
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
1 sibling, 0 replies; 10+ messages in thread
From: Chris Paterson @ 2023-08-02 9:17 UTC (permalink / raw)
To: Pavel Machek, cip-dev@lists.cip-project.org
Hello Pavel,
> From: Pavel Machek <pavel@denx.de>
> Sent: Wednesday, August 2, 2023 9:57 AM
>
> Hi!
>
> There seems to be something wrong with testing at the moment:
Thank you for reporting.
The build container tag we're using got deleted by an automatic cleanup tool. (Same issue as last week).
I've modified the regex being used, so hopefully this won't happen again.
>
> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
> /pipelines/952990993
I've kicked this pipeline off again and it looks to be working again now.
Kind regards, Chris
>
> Best regards,
> Pavel
> --
> DENX Software Engineering GmbH, Managing Director: Erika Unter
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
@ 2023-08-02 9:41 ` Chris Paterson
2023-08-02 10:57 ` Michael Adler
0 siblings, 1 reply; 10+ messages in thread
From: Chris Paterson @ 2023-08-02 9:41 UTC (permalink / raw)
To: Pavel Machek, Adler, Michael; +Cc: cip-dev@lists.cip-project.org
> From: cip-dev@lists.cip-project.org <cip-dev@lists.cip-project.org> On
> Behalf Of Chris Paterson via lists.cip-project.org
> Sent: Wednesday, August 2, 2023 10:17 AM
>
> Hello Pavel,
>
> > From: Pavel Machek <pavel@denx.de>
> > Sent: Wednesday, August 2, 2023 9:57 AM
> >
> > Hi!
> >
> > There seems to be something wrong with testing at the moment:
>
> Thank you for reporting.
>
> The build container tag we're using got deleted by an automatic cleanup tool.
> (Same issue as last week).
> I've modified the regex being used, so hopefully this won't happen again.
>
> >
> >
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
> .com%2Fcip-project%2Fcip-testing%2Flinux-stable-rc-ci%2F-
> &data=05%7C01%7Cchris.paterson2%40renesas.com%7C412cf68cfe8b427e0c
> a908db93394891%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638
> 265646826474582%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s
> data=8dn6HNThUS31l8iHrbVWKqXvcPKfggN7Z944u4M0eEk%3D&reserved=0
> > /pipelines/952990993
>
> I've kicked this pipeline off again and it looks to be working again now.
I spoke too soon.
Our "build" jobs are working fine.
Our "test" jobs are not. For some reason the "small" EC2 instances we spool up to run our testing jobs aren't being started, so our test jobs aren't being run. Grrr.
@Adler, Michael, please could you investigate?
Kind regards, Chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing
2023-08-02 9:41 ` [cip-dev] " Chris Paterson
@ 2023-08-02 10:57 ` Michael Adler
2023-08-02 11:52 ` Michael Adler
2023-08-02 12:25 ` Chris Paterson
0 siblings, 2 replies; 10+ messages in thread
From: Michael Adler @ 2023-08-02 10:57 UTC (permalink / raw)
To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
Hi,
it looks like AWS has got some problems. From the logs:
"We currently do not have sufficient t3a.small capacity in the
Availability Zone you requested (eu-central-1a). Our system will be
working on provisioning additional capacity. You can currently get
t3a.small capacity by not specifying an Availability Zone in your
request or choosing eu-central-1b, eu-central-1c. Launching EC2
instance failed."
In general it seems like autoscaling has some issues in eu-central-1a
since I can also see i/o timeouts. I'll try to spin up a new cluster in
zone -1b.
Kind regards,
Michael
--
Siemens AG
Technology
Connectivity & Edge
Smart Embedded Systems
T CED SES-DE
Otto-Hahn-Ring 6
81739 Munich, Germany
Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim
Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and
Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P.
Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany;
Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB
6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing
2023-08-02 10:57 ` Michael Adler
@ 2023-08-02 11:52 ` Michael Adler
2023-08-02 12:26 ` Chris Paterson
2023-08-02 12:25 ` Chris Paterson
1 sibling, 1 reply; 10+ messages in thread
From: Michael Adler @ 2023-08-02 11:52 UTC (permalink / raw)
To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
> In general it seems like autoscaling has some issues in eu-central-1a
> since I can also see i/o timeouts. I'll try to spin up a new cluster
> in zone -1b.
The new small cluster is up and running. Auto-scaling seems to work in
this zone.
--
Michael Adler
Siemens AG
Technology
Connectivity & Edge
Smart Embedded Systems
T CED SES-DE
Otto-Hahn-Ring 6
81739 Munich, Germany
Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim
Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and
Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P.
Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany;
Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB
6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing
2023-08-02 10:57 ` Michael Adler
2023-08-02 11:52 ` Michael Adler
@ 2023-08-02 12:25 ` Chris Paterson
2023-08-03 13:22 ` Michael Adler
1 sibling, 1 reply; 10+ messages in thread
From: Chris Paterson @ 2023-08-02 12:25 UTC (permalink / raw)
To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
Hello Michael,
> From: Michael Adler <michael.adler@siemens.com>
> Sent: Wednesday, August 2, 2023 11:58 AM
>
>
> Hi,
>
> it looks like AWS has got some problems. From the logs:
Thank you for investigating.
>
> "We currently do not have sufficient t3a.small capacity in the
> Availability Zone you requested (eu-central-1a). Our system will be
> working on provisioning additional capacity. You can currently get
> t3a.small capacity by not specifying an Availability Zone in your
> request or choosing eu-central-1b, eu-central-1c. Launching EC2
> instance failed."
Doh.
Do we need to consider adding some sort of fallback support to gitlab-cloud-ci so it'll try a different region if we get this kind of failure?
>
> In general it seems like autoscaling has some issues in eu-central-1a
> since I can also see i/o timeouts. I'll try to spin up a new cluster in
> zone -1b.
Thanks!
Kind regards, Chris
>
> Kind regards,
> Michael
>
> --
> Siemens AG
> Technology
> Connectivity & Edge
> Smart Embedded Systems
> T CED SES-DE
> Otto-Hahn-Ring 6
> 81739 Munich, Germany
>
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim
> Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and
> Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P.
> Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany;
> Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB
> 6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing
2023-08-02 11:52 ` Michael Adler
@ 2023-08-02 12:26 ` Chris Paterson
0 siblings, 0 replies; 10+ messages in thread
From: Chris Paterson @ 2023-08-02 12:26 UTC (permalink / raw)
To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
Hello Michael,
> From: Michael Adler <michael.adler@siemens.com>
> Sent: Wednesday, August 2, 2023 12:52 PM
>
> > In general it seems like autoscaling has some issues in eu-central-1a
> > since I can also see i/o timeouts. I'll try to spin up a new cluster
> > in zone -1b.
>
> The new small cluster is up and running. Auto-scaling seems to work in
> this zone.
The new cluster seems to be working.
I've paused the old "cip-project-small-v14" runners in GitLab.
Kind regards, Chris
>
> --
> Michael Adler
>
> Siemens AG
> Technology
> Connectivity & Edge
> Smart Embedded Systems
> T CED SES-DE
> Otto-Hahn-Ring 6
> 81739 Munich, Germany
>
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim
> Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and
> Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P.
> Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany;
> Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB
> 6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing
2023-08-02 12:25 ` Chris Paterson
@ 2023-08-03 13:22 ` Michael Adler
2023-08-03 13:38 ` Chris Paterson
0 siblings, 1 reply; 10+ messages in thread
From: Michael Adler @ 2023-08-03 13:22 UTC (permalink / raw)
To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
Hi Chris,
> Do we need to consider adding some sort of fallback support to
> gitlab-cloud-ci so it'll try a different region if we get this kind of
> failure?
yes, I've just implemented this. The small cluster will now (hopefully) try to
auto-scale across 3 different zones, thereby improving availability and
robustness. Previously we had a max node count of 5, now it's 6 (3
zones each having at most 2 nodes). Let's see how that works out in practice.
In particular I'm curious if the auto-scaler will choose a different zone if
it sees that a zone has reached its max count.
If we don't face any issues with that new strategy, then I will upgrade the
large cluster at some point in the future as well.
Kind Regards,
Michael
--
Michael Adler
Siemens AG
Technology
Connectivity & Edge
Smart Embedded Systems
T CED SES-DE
Otto-Hahn-Ring 6
81739 Munich, Germany
Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann
Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive
Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese;
Registered offices: Berlin and Munich, Germany; Commercial registries:
Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing
2023-08-03 13:22 ` Michael Adler
@ 2023-08-03 13:38 ` Chris Paterson
2023-08-03 13:50 ` Michael Adler
0 siblings, 1 reply; 10+ messages in thread
From: Chris Paterson @ 2023-08-03 13:38 UTC (permalink / raw)
To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
Hello Michael,
> From: Michael Adler <michael.adler@siemens.com>
> Sent: Thursday, August 3, 2023 2:23 PM
>
> Hi Chris,
>
> > Do we need to consider adding some sort of fallback support to
> > gitlab-cloud-ci so it'll try a different region if we get this kind of
> > failure?
>
> yes, I've just implemented this. The small cluster will now (hopefully) try to
> auto-scale across 3 different zones, thereby improving availability and
> robustness. Previously we had a max node count of 5, now it's 6 (3
> zones each having at most 2 nodes). Let's see how that works out in practice.
Thank you.
> In particular I'm curious if the auto-scaler will choose a different zone if
> it sees that a zone has reached its max count.
Is there any relevant text we'll see in the GitLab CI job log to indicate which region is being used?
Or can this only be seen from AWS?
> If we don't face any issues with that new strategy, then I will upgrade the
> large cluster at some point in the future as well.
Sounds like a plan.
Kind regards, Chris
>
> Kind Regards,
> Michael
>
> --
> Michael Adler
>
> Siemens AG
> Technology
> Connectivity & Edge
> Smart Embedded Systems
> T CED SES-DE
> Otto-Hahn-Ring 6
> 81739 Munich, Germany
>
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim
> Hagemann
> Snabe; Managing Board: Roland Busch, Chairman, President and Chief
> Executive
> Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese;
> Registered offices: Berlin and Munich, Germany; Commercial registries:
> Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE
> 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing
2023-08-03 13:38 ` Chris Paterson
@ 2023-08-03 13:50 ` Michael Adler
0 siblings, 0 replies; 10+ messages in thread
From: Michael Adler @ 2023-08-03 13:50 UTC (permalink / raw)
To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org
> Is there any relevant text we'll see in the GitLab CI job log to indicate which region is being used?
> Or can this only be seen from AWS?
This can only be seen from AWS. We're using zones eu-central-1a,
eu-central-1b, eu-central-1c for the small cluster now.
Kind Regards,
Michael
--
Michael Adler
Siemens AG
Technology
Connectivity & Edge
Smart Embedded Systems
T CED SES-DE
Otto-Hahn-Ring 6
81739 Munich, Germany
Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann
Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive
Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese;
Registered offices: Berlin and Munich, Germany; Commercial registries:
Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-08-03 22:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-02 8:56 stable-rc-ci tests failing Pavel Machek
2023-08-02 9:17 ` Chris Paterson
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
2023-08-02 9:41 ` [cip-dev] " Chris Paterson
2023-08-02 10:57 ` Michael Adler
2023-08-02 11:52 ` Michael Adler
2023-08-02 12:26 ` Chris Paterson
2023-08-02 12:25 ` Chris Paterson
2023-08-03 13:22 ` Michael Adler
2023-08-03 13:38 ` Chris Paterson
2023-08-03 13:50 ` Michael Adler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox