* stable-rc-ci tests failing
@ 2023-08-02 8:56 Pavel Machek
2023-08-02 9:17 ` Chris Paterson
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
0 siblings, 2 replies; 10+ messages in thread
From: Pavel Machek @ 2023-08-02 8:56 UTC (permalink / raw)
To: chris.paterson2, cip-dev
[-- Attachment #1: Type: text/plain, Size: 336 bytes --]
Hi!
There seems to be something wrong with testing at the moment:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/952990993
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* RE: stable-rc-ci tests failing 2023-08-02 8:56 stable-rc-ci tests failing Pavel Machek @ 2023-08-02 9:17 ` Chris Paterson [not found] ` <17778648DB0D91E2.2497@lists.cip-project.org> 1 sibling, 0 replies; 10+ messages in thread From: Chris Paterson @ 2023-08-02 9:17 UTC (permalink / raw) To: Pavel Machek, cip-dev@lists.cip-project.org Hello Pavel, > From: Pavel Machek <pavel@denx.de> > Sent: Wednesday, August 2, 2023 9:57 AM > > Hi! > > There seems to be something wrong with testing at the moment: Thank you for reporting. The build container tag we're using got deleted by an automatic cleanup tool. (Same issue as last week). I've modified the regex being used, so hopefully this won't happen again. > > https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/- > /pipelines/952990993 I've kicked this pipeline off again and it looks to be working again now. Kind regards, Chris > > Best regards, > Pavel > -- > DENX Software Engineering GmbH, Managing Director: Erika Unter > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <17778648DB0D91E2.2497@lists.cip-project.org>]
* RE: [cip-dev] stable-rc-ci tests failing [not found] ` <17778648DB0D91E2.2497@lists.cip-project.org> @ 2023-08-02 9:41 ` Chris Paterson 2023-08-02 10:57 ` Michael Adler 0 siblings, 1 reply; 10+ messages in thread From: Chris Paterson @ 2023-08-02 9:41 UTC (permalink / raw) To: Pavel Machek, Adler, Michael; +Cc: cip-dev@lists.cip-project.org > From: cip-dev@lists.cip-project.org <cip-dev@lists.cip-project.org> On > Behalf Of Chris Paterson via lists.cip-project.org > Sent: Wednesday, August 2, 2023 10:17 AM > > Hello Pavel, > > > From: Pavel Machek <pavel@denx.de> > > Sent: Wednesday, August 2, 2023 9:57 AM > > > > Hi! > > > > There seems to be something wrong with testing at the moment: > > Thank you for reporting. > > The build container tag we're using got deleted by an automatic cleanup tool. > (Same issue as last week). > I've modified the regex being used, so hopefully this won't happen again. > > > > > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab > .com%2Fcip-project%2Fcip-testing%2Flinux-stable-rc-ci%2F- > &data=05%7C01%7Cchris.paterson2%40renesas.com%7C412cf68cfe8b427e0c > a908db93394891%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638 > 265646826474582%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s > data=8dn6HNThUS31l8iHrbVWKqXvcPKfggN7Z944u4M0eEk%3D&reserved=0 > > /pipelines/952990993 > > I've kicked this pipeline off again and it looks to be working again now. I spoke too soon. Our "build" jobs are working fine. Our "test" jobs are not. For some reason the "small" EC2 instances we spool up to run our testing jobs aren't being started, so our test jobs aren't being run. Grrr. @Adler, Michael, please could you investigate? Kind regards, Chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing 2023-08-02 9:41 ` [cip-dev] " Chris Paterson @ 2023-08-02 10:57 ` Michael Adler 2023-08-02 11:52 ` Michael Adler 2023-08-02 12:25 ` Chris Paterson 0 siblings, 2 replies; 10+ messages in thread From: Michael Adler @ 2023-08-02 10:57 UTC (permalink / raw) To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org Hi, it looks like AWS has got some problems. From the logs: "We currently do not have sufficient t3a.small capacity in the Availability Zone you requested (eu-central-1a). Our system will be working on provisioning additional capacity. You can currently get t3a.small capacity by not specifying an Availability Zone in your request or choosing eu-central-1b, eu-central-1c. Launching EC2 instance failed." In general it seems like autoscaling has some issues in eu-central-1a since I can also see i/o timeouts. I'll try to spin up a new cluster in zone -1b. Kind regards, Michael -- Siemens AG Technology Connectivity & Edge Smart Embedded Systems T CED SES-DE Otto-Hahn-Ring 6 81739 Munich, Germany Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing 2023-08-02 10:57 ` Michael Adler @ 2023-08-02 11:52 ` Michael Adler 2023-08-02 12:26 ` Chris Paterson 2023-08-02 12:25 ` Chris Paterson 1 sibling, 1 reply; 10+ messages in thread From: Michael Adler @ 2023-08-02 11:52 UTC (permalink / raw) To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org > In general it seems like autoscaling has some issues in eu-central-1a > since I can also see i/o timeouts. I'll try to spin up a new cluster > in zone -1b. The new small cluster is up and running. Auto-scaling seems to work in this zone. -- Michael Adler Siemens AG Technology Connectivity & Edge Smart Embedded Systems T CED SES-DE Otto-Hahn-Ring 6 81739 Munich, Germany Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing 2023-08-02 11:52 ` Michael Adler @ 2023-08-02 12:26 ` Chris Paterson 0 siblings, 0 replies; 10+ messages in thread From: Chris Paterson @ 2023-08-02 12:26 UTC (permalink / raw) To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org Hello Michael, > From: Michael Adler <michael.adler@siemens.com> > Sent: Wednesday, August 2, 2023 12:52 PM > > > In general it seems like autoscaling has some issues in eu-central-1a > > since I can also see i/o timeouts. I'll try to spin up a new cluster > > in zone -1b. > > The new small cluster is up and running. Auto-scaling seems to work in > this zone. The new cluster seems to be working. I've paused the old "cip-project-small-v14" runners in GitLab. Kind regards, Chris > > -- > Michael Adler > > Siemens AG > Technology > Connectivity & Edge > Smart Embedded Systems > T CED SES-DE > Otto-Hahn-Ring 6 > 81739 Munich, Germany > > Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim > Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and > Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. > Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; > Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB > 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing 2023-08-02 10:57 ` Michael Adler 2023-08-02 11:52 ` Michael Adler @ 2023-08-02 12:25 ` Chris Paterson 2023-08-03 13:22 ` Michael Adler 1 sibling, 1 reply; 10+ messages in thread From: Chris Paterson @ 2023-08-02 12:25 UTC (permalink / raw) To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org Hello Michael, > From: Michael Adler <michael.adler@siemens.com> > Sent: Wednesday, August 2, 2023 11:58 AM > > > Hi, > > it looks like AWS has got some problems. From the logs: Thank you for investigating. > > "We currently do not have sufficient t3a.small capacity in the > Availability Zone you requested (eu-central-1a). Our system will be > working on provisioning additional capacity. You can currently get > t3a.small capacity by not specifying an Availability Zone in your > request or choosing eu-central-1b, eu-central-1c. Launching EC2 > instance failed." Doh. Do we need to consider adding some sort of fallback support to gitlab-cloud-ci so it'll try a different region if we get this kind of failure? > > In general it seems like autoscaling has some issues in eu-central-1a > since I can also see i/o timeouts. I'll try to spin up a new cluster in > zone -1b. Thanks! Kind regards, Chris > > Kind regards, > Michael > > -- > Siemens AG > Technology > Connectivity & Edge > Smart Embedded Systems > T CED SES-DE > Otto-Hahn-Ring 6 > 81739 Munich, Germany > > Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim > Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and > Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. > Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; > Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB > 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing 2023-08-02 12:25 ` Chris Paterson @ 2023-08-03 13:22 ` Michael Adler 2023-08-03 13:38 ` Chris Paterson 0 siblings, 1 reply; 10+ messages in thread From: Michael Adler @ 2023-08-03 13:22 UTC (permalink / raw) To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org Hi Chris, > Do we need to consider adding some sort of fallback support to > gitlab-cloud-ci so it'll try a different region if we get this kind of > failure? yes, I've just implemented this. The small cluster will now (hopefully) try to auto-scale across 3 different zones, thereby improving availability and robustness. Previously we had a max node count of 5, now it's 6 (3 zones each having at most 2 nodes). Let's see how that works out in practice. In particular I'm curious if the auto-scaler will choose a different zone if it sees that a zone has reached its max count. If we don't face any issues with that new strategy, then I will upgrade the large cluster at some point in the future as well. Kind Regards, Michael -- Michael Adler Siemens AG Technology Connectivity & Edge Smart Embedded Systems T CED SES-DE Otto-Hahn-Ring 6 81739 Munich, Germany Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [cip-dev] stable-rc-ci tests failing 2023-08-03 13:22 ` Michael Adler @ 2023-08-03 13:38 ` Chris Paterson 2023-08-03 13:50 ` Michael Adler 0 siblings, 1 reply; 10+ messages in thread From: Chris Paterson @ 2023-08-03 13:38 UTC (permalink / raw) To: Michael Adler; +Cc: Pavel Machek, cip-dev@lists.cip-project.org Hello Michael, > From: Michael Adler <michael.adler@siemens.com> > Sent: Thursday, August 3, 2023 2:23 PM > > Hi Chris, > > > Do we need to consider adding some sort of fallback support to > > gitlab-cloud-ci so it'll try a different region if we get this kind of > > failure? > > yes, I've just implemented this. The small cluster will now (hopefully) try to > auto-scale across 3 different zones, thereby improving availability and > robustness. Previously we had a max node count of 5, now it's 6 (3 > zones each having at most 2 nodes). Let's see how that works out in practice. Thank you. > In particular I'm curious if the auto-scaler will choose a different zone if > it sees that a zone has reached its max count. Is there any relevant text we'll see in the GitLab CI job log to indicate which region is being used? Or can this only be seen from AWS? > If we don't face any issues with that new strategy, then I will upgrade the > large cluster at some point in the future as well. Sounds like a plan. Kind regards, Chris > > Kind Regards, > Michael > > -- > Michael Adler > > Siemens AG > Technology > Connectivity & Edge > Smart Embedded Systems > T CED SES-DE > Otto-Hahn-Ring 6 > 81739 Munich, Germany > > Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim > Hagemann > Snabe; Managing Board: Roland Busch, Chairman, President and Chief > Executive > Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; > Registered offices: Berlin and Munich, Germany; Commercial registries: > Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE > 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [cip-dev] stable-rc-ci tests failing 2023-08-03 13:38 ` Chris Paterson @ 2023-08-03 13:50 ` Michael Adler 0 siblings, 0 replies; 10+ messages in thread From: Michael Adler @ 2023-08-03 13:50 UTC (permalink / raw) To: Chris Paterson; +Cc: Pavel Machek, cip-dev@lists.cip-project.org > Is there any relevant text we'll see in the GitLab CI job log to indicate which region is being used? > Or can this only be seen from AWS? This can only be seen from AWS. We're using zones eu-central-1a, eu-central-1b, eu-central-1c for the small cluster now. Kind Regards, Michael -- Michael Adler Siemens AG Technology Connectivity & Edge Smart Embedded Systems T CED SES-DE Otto-Hahn-Ring 6 81739 Munich, Germany Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann Snabe; Managing Board: Roland Busch, Chairman, President and Chief Executive Officer; Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin-Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-08-03 22:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-02 8:56 stable-rc-ci tests failing Pavel Machek
2023-08-02 9:17 ` Chris Paterson
[not found] ` <17778648DB0D91E2.2497@lists.cip-project.org>
2023-08-02 9:41 ` [cip-dev] " Chris Paterson
2023-08-02 10:57 ` Michael Adler
2023-08-02 11:52 ` Michael Adler
2023-08-02 12:26 ` Chris Paterson
2023-08-02 12:25 ` Chris Paterson
2023-08-03 13:22 ` Michael Adler
2023-08-03 13:38 ` Chris Paterson
2023-08-03 13:50 ` Michael Adler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox