out of CI pipeline minutes again

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* out of CI pipeline minutes again
@ 2023-02-23 12:56 Peter Maydell
  2023-02-23 13:46 ` Thomas Huth
  2023-02-23 15:28 ` Ben Dooks
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Maydell @ 2023-02-23 12:56 UTC (permalink / raw)
  To: QEMU Developers

Hi; the project is out of gitlab CI pipeline minutes again.
In the absence of any other proposals, no more pull request
merges will happen til 1st March...

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell
@ 2023-02-23 13:46 ` Thomas Huth
  2023-02-23 14:14   ` Daniel P. Berrangé
  2023-02-23 14:15   ` Warner Losh
  2023-02-23 15:28 ` Ben Dooks
  1 sibling, 2 replies; 19+ messages in thread
From: Thomas Huth @ 2023-02-23 13:46 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers

On 23/02/2023 13.56, Peter Maydell wrote:
> Hi; the project is out of gitlab CI pipeline minutes again.
> In the absence of any other proposals, no more pull request
> merges will happen til 1st March...

I'd like to propose again to send a link along with the pull request that 
shows that the shared runners are all green in the fork of the requester. 
You'd only need to check the custom runners in that case, which hopefully 
still work fine without CI minutes?

It's definitely more cumbersome, but maybe better than queuing dozens of 
pull requests right in front of the soft freeze?

  Thomas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 13:46 ` Thomas Huth
@ 2023-02-23 14:14   ` Daniel P. Berrangé
  2023-02-23 14:15   ` Warner Losh
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2023-02-23 14:14 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Peter Maydell, QEMU Developers

On Thu, Feb 23, 2023 at 02:46:40PM +0100, Thomas Huth wrote:
> On 23/02/2023 13.56, Peter Maydell wrote:
> > Hi; the project is out of gitlab CI pipeline minutes again.
> > In the absence of any other proposals, no more pull request
> > merges will happen til 1st March...
> 
> I'd like to propose again to send a link along with the pull request that
> shows that the shared runners are all green in the fork of the requester.
> You'd only need to check the custom runners in that case, which hopefully
> still work fine without CI minutes?

The maintainer's fork will almost certainly not be against current
HEAD though. So test results from them will not be equivalent to
the tests that Peter normally does on staging, which reflects the
result of merging current HEAD + the pull request.

Sometimes that won't matter, but especially near freeze when we have
a high volume of pull requests, I think that's an important difference
to reduce risk of regressions.

> It's definitely more cumbersome, but maybe better than queuing dozens of
> pull requests right in front of the soft freeze?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 13:46 ` Thomas Huth
  2023-02-23 14:14   ` Daniel P. Berrangé
@ 2023-02-23 14:15   ` Warner Losh
  2023-02-23 15:00     ` Daniel P. Berrangé
  1 sibling, 1 reply; 19+ messages in thread
From: Warner Losh @ 2023-02-23 14:15 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Peter Maydell, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Thu, Feb 23, 2023, 6:48 AM Thomas Huth <thuth@redhat.com> wrote:

> On 23/02/2023 13.56, Peter Maydell wrote:
> > Hi; the project is out of gitlab CI pipeline minutes again.
> > In the absence of any other proposals, no more pull request
> > merges will happen til 1st March...
>
> I'd like to propose again to send a link along with the pull request that
> shows that the shared runners are all green in the fork of the requester.
> You'd only need to check the custom runners in that case, which hopefully
> still work fine without CI minutes?
>
> It's definitely more cumbersome, but maybe better than queuing dozens of
> pull requests right in front of the soft freeze?
>

Yea. I'm just getting done with my pull request and it's really
demotivating to be done early and miss the boat...

I'm happy to do this because it's what I do anyway before sending a pull...

Warner

  Thomas
>
>
>

[-- Attachment #2: Type: text/html, Size: 1591 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 14:15   ` Warner Losh
@ 2023-02-23 15:00     ` Daniel P. Berrangé
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2023-02-23 15:00 UTC (permalink / raw)
  To: Warner Losh; +Cc: Thomas Huth, Peter Maydell, QEMU Developers

On Thu, Feb 23, 2023 at 07:15:47AM -0700, Warner Losh wrote:
> On Thu, Feb 23, 2023, 6:48 AM Thomas Huth <thuth@redhat.com> wrote:
> 
> > On 23/02/2023 13.56, Peter Maydell wrote:
> > > Hi; the project is out of gitlab CI pipeline minutes again.
> > > In the absence of any other proposals, no more pull request
> > > merges will happen til 1st March...
> >
> > I'd like to propose again to send a link along with the pull request that
> > shows that the shared runners are all green in the fork of the requester.
> > You'd only need to check the custom runners in that case, which hopefully
> > still work fine without CI minutes?
> >
> > It's definitely more cumbersome, but maybe better than queuing dozens of
> > pull requests right in front of the soft freeze?
> >
> 
> Yea. I'm just getting done with my pull request and it's really
> demotivating to be done early and miss the boat...

Send your pull request anyway, so it is in the queue to be handled.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell
  2023-02-23 13:46 ` Thomas Huth
@ 2023-02-23 15:28 ` Ben Dooks
  2023-02-23 15:33   ` Daniel P. Berrangé
  1 sibling, 1 reply; 19+ messages in thread
From: Ben Dooks @ 2023-02-23 15:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote:
> Hi; the project is out of gitlab CI pipeline minutes again.
> In the absence of any other proposals, no more pull request
> merges will happen til 1st March...

Is there a way of sponsoring more minutes, could people provide
runner resources to help?

-- 
Ben Dooks, ben@fluff.org, http://www.fluff.org/ben/

Large Hadron Colada: A large Pina Colada that makes the universe disappear.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 15:28 ` Ben Dooks
@ 2023-02-23 15:33   ` Daniel P. Berrangé
  2023-02-23 22:11     ` Eldon Stegall
  2023-02-24  9:54     ` Paolo Bonzini
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2023-02-23 15:33 UTC (permalink / raw)
  To: Ben Dooks; +Cc: Peter Maydell, QEMU Developers

On Thu, Feb 23, 2023 at 03:28:37PM +0000, Ben Dooks wrote:
> On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote:
> > Hi; the project is out of gitlab CI pipeline minutes again.
> > In the absence of any other proposals, no more pull request
> > merges will happen til 1st March...
> 
> Is there a way of sponsoring more minutes, could people provide
> runner resources to help?

IIUC, we already have available compute resources from a couple of
sources we could put into service. The main issue is someone to
actually configure them to act as runners *and* maintain their
operation indefinitely going forward. The sysadmin problem is
what made/makes gitlab's shared runners so incredibly appealing.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 15:33   ` Daniel P. Berrangé
@ 2023-02-23 22:11     ` Eldon Stegall
  2023-02-24  9:16       ` Gerd Hoffmann
                         ` (2 more replies)
  2023-02-24  9:54     ` Paolo Bonzini
  1 sibling, 3 replies; 19+ messages in thread
From: Eldon Stegall @ 2023-02-23 22:11 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Ben Dooks, Peter Maydell, QEMU Developers

On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote:
> IIUC, we already have available compute resources from a couple of
> sources we could put into service. The main issue is someone to
> actually configure them to act as runners *and* maintain their
> operation indefinitely going forward. The sysadmin problem is
> what made/makes gitlab's shared runners so incredibly appealing.

Hello,

I would like to do this, but the path to contribute in this way isn't clear to
me at this moment. I made it as far as creating a GitLab fork of QEMU, and then
attempting to create a merge request from my branch in order to test the GitLab
runner I have provisioned. Not having previously tried to contribute via
GitLab, I was a bit stymied that it is not even possibly to create a merge
request unless I am a member of the project? I clicked a button to request
access.  

Alex's plan from last month sounds feasible:

 - provisioning scripts in scripts/ci/setup (if existing not already 
 good enough) 
 - tweak to handle multiple runner instances (or more -j on the build) 
 - changes to .gitlab-ci.d/ so we can use those machines while keeping 
 ability to run on shared runners for those outside the project 

Daniel, you pointed out the importance of reproducibility, and thus the
use of the two-step process, build-docker, and then test-in-docker, so it
seems that only docker and the gitlab agent would be strong requirements for
running the jobs?

I feel like the greatest win for this would be to at least host the
cirrus-run jobs on a dedicated runner because the machine seems to
simply be burning double minutes until the cirrus job is complete, so I
would expect the GitLab runner requirements for those jobs to be low?

If there are some other steps that I should take to contribute in this
capacity, please let me know.

Maybe I could send a patch to tag cirrus jobs in the same way that the
s390x jobs are currently tagged, so that we could run those separately?

Thanks,
Eldon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 22:11     ` Eldon Stegall
@ 2023-02-24  9:16       ` Gerd Hoffmann
  2023-02-24 14:07       ` Alex Bennée
  2023-02-27 16:59       ` Daniel P. Berrangé
  2 siblings, 0 replies; 19+ messages in thread
From: Gerd Hoffmann @ 2023-02-24  9:16 UTC (permalink / raw)
  To: Eldon Stegall
  Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell,
	QEMU Developers

On Thu, Feb 23, 2023 at 10:11:11PM +0000, Eldon Stegall wrote:
> On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote:
> > IIUC, we already have available compute resources from a couple of
> > sources we could put into service. The main issue is someone to
> > actually configure them to act as runners *and* maintain their
> > operation indefinitely going forward. The sysadmin problem is
> > what made/makes gitlab's shared runners so incredibly appealing.

I have a gitlab runner active on a hosted machine.  It builds on fedora
coreos and doesn't need much baby-sitting.  Just copied the bits into a
new repository and pushed to
	https://gitlab.com/kraxel/coreos-gitlab-runner

> Daniel, you pointed out the importance of reproducibility, and thus the
> use of the two-step process, build-docker, and then test-in-docker, so it
> seems that only docker and the gitlab agent would be strong requirements for
> running the jobs?

The above works just fine as replacement for the shared runners.  Can
also run in parallel to the shared runners, but it's slower on picking
up jobs, so it'll effectively take over when you ran out of minutes.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 22:11     ` Eldon Stegall
  2023-02-24  9:16       ` Gerd Hoffmann
@ 2023-02-24 14:07       ` Alex Bennée
  2023-02-27 16:59       ` Daniel P. Berrangé
  2 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2023-02-24 14:07 UTC (permalink / raw)
  To: Eldon Stegall
  Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell, qemu-devel


Eldon Stegall <eldon-qemu@eldondev.com> writes:

> On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote:
>> IIUC, we already have available compute resources from a couple of
>> sources we could put into service. The main issue is someone to
>> actually configure them to act as runners *and* maintain their
>> operation indefinitely going forward. The sysadmin problem is
>> what made/makes gitlab's shared runners so incredibly appealing.
>
> Hello,
>
> I would like to do this, but the path to contribute in this way isn't clear to
> me at this moment. I made it as far as creating a GitLab fork of QEMU, and then
> attempting to create a merge request from my branch in order to test the GitLab
> runner I have provisioned. Not having previously tried to contribute via
> GitLab, I was a bit stymied that it is not even possibly to create a merge
> request unless I am a member of the project? I clicked a button to request
> access.

We don't process merge requests and shouldn't need them to run CI. By
default a pushed branch won't trigger testing so we document a way to
tweak your GIT config to set the QEMU_CI environment so:

  git push-ci-now -f gitlab

will trigger the testing. See:

  https://qemu.readthedocs.io/en/latest/devel/ci.html#custom-ci-cd-variables

>
> Alex's plan from last month sounds feasible:
>  
>  - provisioning scripts in scripts/ci/setup (if existing not already 
>  good enough) 
>  - tweak to handle multiple runner instances (or more -j on the build) 
>  - changes to .gitlab-ci.d/ so we can use those machines while keeping 
>  ability to run on shared runners for those outside the project 
>
> Daniel, you pointed out the importance of reproducibility, and thus the
> use of the two-step process, build-docker, and then test-in-docker, so it
> seems that only docker and the gitlab agent would be strong requirements for
> running the jobs?

Yeah the current provisioning scripts install packages to the host. We'd
like to avoid that and use the runner inside our docker images rather
than polluting the host with setup. Although in practice some hosts pull
double duty and developers want to be able to replicate the setup when
chasing CI errors so will likely install the packages anyway.

>
> I feel like the greatest win for this would be to at least host the
> cirrus-run jobs on a dedicated runner because the machine seems to
> simply be burning double minutes until the cirrus job is complete, so I
> would expect the GitLab runner requirements for those jobs to be low?
>
> If there are some other steps that I should take to contribute in this
> capacity, please let me know.
>
> Maybe I could send a patch to tag cirrus jobs in the same way that the
> s390x jobs are currently tagged, so that we could run those separately?
>
> Thanks,
> Eldon


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 22:11     ` Eldon Stegall
  2023-02-24  9:16       ` Gerd Hoffmann
  2023-02-24 14:07       ` Alex Bennée
@ 2023-02-27 16:59       ` Daniel P. Berrangé
  2023-02-27 17:43         ` Stefan Hajnoczi
  2 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2023-02-27 16:59 UTC (permalink / raw)
  To: Eldon Stegall; +Cc: Ben Dooks, Peter Maydell, QEMU Developers

On Thu, Feb 23, 2023 at 10:11:11PM +0000, Eldon Stegall wrote:
> On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote:
> > IIUC, we already have available compute resources from a couple of
> > sources we could put into service. The main issue is someone to
> > actually configure them to act as runners *and* maintain their
> > operation indefinitely going forward. The sysadmin problem is
> > what made/makes gitlab's shared runners so incredibly appealing.
> 
> Hello,
> 
> I would like to do this, but the path to contribute in this way isn't clear to
> me at this moment. I made it as far as creating a GitLab fork of QEMU, and then
> attempting to create a merge request from my branch in order to test the GitLab
> runner I have provisioned. Not having previously tried to contribute via
> GitLab, I was a bit stymied that it is not even possibly to create a merge
> request unless I am a member of the project? I clicked a button to request
> access.  
> 
> Alex's plan from last month sounds feasible:
>  
>  - provisioning scripts in scripts/ci/setup (if existing not already 
>  good enough) 
>  - tweak to handle multiple runner instances (or more -j on the build) 
>  - changes to .gitlab-ci.d/ so we can use those machines while keeping 
>  ability to run on shared runners for those outside the project 
> 
> Daniel, you pointed out the importance of reproducibility, and thus the
> use of the two-step process, build-docker, and then test-in-docker, so it
> seems that only docker and the gitlab agent would be strong requirements for
> running the jobs?

Almost our entire CI setup is built around use of docker and I don't
believe we really want to change that. Even ignoring GitLab, pretty
much all public CI services support use of docker containers for the
CI environment, so that's a defacto standard.

So while git gitlab runner agent can support many different execution
environments, I don't think we want to consider any except for the
ones that support containers (and that would need docker-in-docker
to be enabled too). Essentially we'll be using GitLab free CI credits
for most of the month. What we need is some extra private CI resource
that can pick up the the slack when we run out of free CI credits each
month. Thus the private CI resource needs to be compatible with the
public shared runners, by providing the same docker based environment[1].

It is a great shame that our current private runners ansible playbooks
were not configuring thue system for use with docker, as that would
have got us 90% of the way there already.

One thing to bear in mind is that a typical QEMU pipeline has 130 jobs
running.

Each gitlab shared runner is 1 vCPU, 3.75 GB of RAM, and we're using
as many as 60-70 of such instances at a time.  A single physical
machine probably won't cope unless it is very big.

To avoid making the overall pipeline wallclock time too long, we need
to be able to handle a large number of parallel jobs at certain times.
We're quite peaky in our usage. Some days we merge nothing and so
consume no CI. Some days we may merge many PRs and so consumes lots
of CI.  So buying lots of VMs to run 24x7 is quite wasteful. A burstab;le
container service is quite appealing

IIUC, GitLab's shared runners use GCP's  "spot" instances which are
cheaper than regular instances. The downside is that the VM can get
killed/descheduled if something higher priority needs Google's
resources. Not too nice for reliabilty, but excellant for cost saving.

With regards,
Daniel

[1] there are still several ways to achieve this.  A bare metal machine
    with a local install of docker, or podman, vs pointing to a public
    k8s instance that can run containers, and possibly other options
    too.
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-27 16:59       ` Daniel P. Berrangé
@ 2023-02-27 17:43         ` Stefan Hajnoczi
  2023-03-01  4:51           ` Eldon Stegall
  2023-03-21 16:40           ` Daniel P. Berrangé
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2023-02-27 17:43 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers

Here are IRC logs from a discussion that has taken place about this
topic. Summary:
- QEMU has ~$500/month Azure credits available that could be used for CI
- Burstable VMs like Azure AKS nodes seem like a good strategy in
order to minimize hosting costs and parallelize when necessary to keep
CI duration low
- Paolo is asking for someone from Red Hat to dedicate the time to set
up Azure AKS with GitLab CI
- Personally, I don't think this should exclude other efforts like
Eldon's. We can always add more private runners!

11:31 < pm215> does anybody who understands how the CI stuff works
have the time to take on the task of getting this all to work with
either (a) a custom runner running on one of the hosts we've been
offered or (b) whatever it needs to run using our donated azure
credits ?
11:34 < danpb> i would love to, but i can't volunteer my time right now :-(
11:34 < stefanha> Here is the email thread for anyone else who's
trying to catch up (like me):
https://lore.kernel.org/qemu-devel/CAFEAcA83u_ENxDj3GJKa-xv6eLJGJPr_9FRDKAqm3qACyhrTgg@mail.gmail.com/
11:34 -!- iggy [~iggy@47.152.10.131] has quit [Quit: WeeChat 3.5]
11:35 -!- peterx_ is now known as peterx
11:35 < danpb> what paolo suggested about using the Kubernetes runners
for Azure seems like the ideal  approach
11:35 -!- peterx
[~xz@bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca] has quit
[Quit: WeeChat 3.6]
11:35 -!- peterx
[~xz@bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca] has joined
#qemu
11:36 < danpb> as it would be most cost effective in terms of Azure
resources consumed, and would scale well as it would support as many
parallel runners as we can afford from our Azure allowance
11:36 < danpb> but its also probably ythe more complex option to setup
11:36 -!- dramforever [~dramforev@59.66.131.33] has quit [Ping
timeout: 480 seconds]
11:37 < stefanha> It's a little sad to see the people who are
volunteering to help being ignored in the email thread.
11:37 < stefanha> Ben Dooks asked about donating minutes (the easiest solution).
11:38 -!- sgarzare [~sgarzare@c-115-213.cust-q.wadsl.it] has quit
[Remote host closed the connection]
11:38 < peterx> jsnow[m]: an AHCI question: where normally does the
address in ahci_map_fis_address() (aka, AHCIPortRegs.is_addr[_hi])
reside?  Should that be part of guest RAM that the guest AHCI driver
maps?
11:38 < stefanha> eldondev is in the process of setting up a runner.
11:38 < pm215> stefanha: the problem is there is no one person who has
all of (a) the authority to do stuff (b) the knowledge of what the
right thing to do is (c) the time to do it...
11:38 < danpb> stefanha: the challenge is how to accept the donation
in a sustainable way
11:39 < th_huth> stefanha: Do you know whether we still got that
fosshost server around? ... I know, fosshost is going away, but maybe
we could use it as a temporary solution at least
11:39 < stefanha> th_huth: fosshost, the organization, has ceased to operate.
11:39 < danpb> fosshost ceased operation as a concept
11:39 < davidgiluk> I wonder if there's a way to make it so that those
of us with hardware can avoid eating into the central CI count
11:39 < danpb> if we still have any access to a machine its just by luck
11:39 < stefanha> th_huth: QEMU has 2 smallish x86 VMs ready to at
Oregon State University Open Source Lab.
11:40 < stefanha> (2 vCPUs and 4 GB RAM, probably not enough for
private runners)
11:40 -!- amorenoz [~amorenoz@139.47.72.25] has quit [Read error:
Connection reset by peer]
11:41 < peterx> jsnow[m]: the context is that someone is optimizing
migration by postponing all memory updates to after some point, and
the AHCI post_load is not happy because it cannot find/map the FIS
address here due to delayed memory commit()
(https://pastebin.com/kADnTKzp).  However when I look at that I failed
to see why if it's in normal RAM (but I think I had somewhere wrong)
11:41 < danpb> stefanha:   fyi  gitlab's shared runners are   1 vCPU,
 3.75 GB of RAM by default
11:41 < stefanha> Gerd Hoffmann seems to have a hands-off/stateless
private runner setup that can be scaled to multiple machines.
11:41 < stefanha> danpb: :)
11:41 < danpb> stefanha: so those two small VMs are equivalent to 2
runners, and with 30-40 jobs we run in parallel
11:41 < stefanha> The first thing that needs to be decided is which
approach to take:
11:41 < jsnow[m]> peterx: I'm rusty but I think so. the guest writes
an FIS (Frame Information Structure)  for the card to read and operate
on
11:41 < danpb> stefanha:  just having two runners is going to make our
pipelines take x20 longer to finish
11:41 < stefanha> 1. Donating more minutes to GitLab CI
11:42 < stefanha> 2. Using Azure to spawn runners
11:42 < stsquad> stefanha I think the main bottleneck is commissioning
and admin'ing machines - but we have ansible playbooks to do it for
our other custom runners so it should "just" be a case of writing one
for an x86 runner
11:42 < stefanha> 3. Hosting runners on servers
11:42 < danpb> that's why i think the Azure k8s  executor sounded
promising - it would burst upto 20-30 jobs in parallel for the short
time we run CI
11:42 < danpb> without us having to pay for 30 vms  24x7
11:42 < stsquad> who actually understands configuring k8s?
11:42 < th_huth> stefanha: I read that fosshost announcement that they
will be going away ... not that they have already terminated
everything ... but sure, it's not something sustainable
11:42 < stefanha> stsquad: Yep, kraxel's approach solves that because
it's stateless/automated.
11:43 < peterx> jsnow[m]: thanks, then let me double check
11:43 < jsnow[m]> peterx: iirc the guest writes the address of the FIS
to a register and then the pci card maps that address to read the
larger command structure
11:43 < danpb> stsquad (@_oftc_stsquad:matrix.org):   i don't think
we'd need to configure k8s itself, just figure out how to point gitlab
runner as the azure k8s service
11:44 < stefanha> danpb: Has someone calculated the cost needed to run
QEMU CI on Azure? It's great that we can burst it when needed, but
will the free Azure quota be enough?
11:44 -!- mmu_man [~revol@188410969.box.freepro.com] has joined #qemu
11:44 < danpb> stsquad:  the problem with our current ansible
playbooks is that none of them used docker AFAIR, they just setup the
gitlab runer as bare metal
11:44 < peterx> jsnow[m]: yes, the thing is if that's the case the RAM
should have been there when post_load() even without commit anything,
so maybe there's something else I missed
11:44 < stefanha> i.e. will we just hit a funding wall again but on
Azure instead of on GitLab?
11:44 < danpb> stefanha  don't think anyone's calculated it, would
hafve to ask bonzini what we actually get access to
11:45 < danpb> what would help is that we would not need azure for the
whole month
11:45 < danpb> we would onl need it to fill in the gap when gitlab
allowance is consumed
11:45 < stsquad> danpb I'm sure they can be re-written - I can't
recall what stopped us using docker in the first place
11:46 < stsquad> but I'm a little wary of experimenting on the live CI server
11:46 < danpb> they wanted to run  avocado tests which utilize some
bare metal features
11:46 < stsquad> ahh that would be it
11:46 < stsquad> access to /dev/kvm
11:46 < danpb> i suggested that we set it up to expose KVM etc to the
container but it wasn't done that way :-(
11:49 < stefanha> danpb: A simple estimate would be: "QEMU uses 50k CI
minutes around the 20th of each month, so thats 50/20 * 10 more days =
25k CI minutes needed to cover those last 10 days"
11:49 < stefanha> Assuming GitLab CI minutes are equivalent to Azure k8s minutes
11:50 < stefanha> and then multiply 25k minutes by the Azure instance
price rate.
11:51 < stefanha> ISTR the Azure quota is manually renewed by
bonzini[m]. It may have been something like $10k and we use $2k of it
for non-CI stuff at the moment.
11:52 < stefanha> I'm not sure if the $10k is renewed annually or semi-annually.
11:52 -!- genpaku_ [~genpaku@107.191.100.185] has quit [Read error:
Connection reset by peer]
11:52 < stefanha> So maybe $8k available per year.
11:52 < dwmw2_gone> I feel I ought to be able to round up some VM instances too.
11:53 -!- farosas [~farosas@177.103.113.244] has quit [Quit: Leaving]
11:53 -!- farosas [~farosas@177.103.113.244] has joined #qemu
11:54 < bonzini> stefanha: right, more like $3k to be safe
11:54 < bonzini> dwmw2_gone: the right thing to do would be to set up
kubernetes/Fargate
11:54 < bonzini> same for Azure
11:54 -!- zzhu [~zzhu@072-182-049-214.res.spectrum.com] has quit
[Remote host closed the connection]
11:55 < bonzini> dwmw2_gone: because what we really need is beefy VMs
(let's say 10*16 vCPU) for a few hours a week, not something 24/7
11:55 < bonzini> the Azure and AWS estimators both gave ~1000$/year
11:56 -!- genpaku [~genpaku@107.191.100.185] has joined #qemu
11:57 < dwmw2_gone> I have "build scripts" which launch an instance,
do the build there, terminate it. Why would you need anything 24/7? :)
11:57 < dwmw2_gone> I abuse some of our test harnesses for builds
11:57 < dwmw2_gone> You can have bare metal that way, and actually get KVM.
11:58 < bonzini> dwmw2_gone: 24/7 because that's what the gitlab
runners want (unless you put them on kubernetes)
11:59 -!- vliaskov
[~vliaskov@dynamic-077-191-055-225.77.191.pool.telefonica.de] has quit
[Remote host closed the connection]
11:59 < dwmw2_gone> Ah. Unless the gitlab runners just spawned the
instance to do the test, and waited for it. They don't use many CPU
minutse that way.
11:59 -!- bolt [~r00t@000182e9.user.oftc.net] has quit [Ping timeout:
480 seconds]
12:00 < stefanha> 25k mins / 60 minutes/hour = 417 hours/month @ AKS
node hourly price $0.077 = $32 month (!)
12:00 < bonzini> stefanha: danpb: i think spending 250-500 $ on GitLab
CI while we set up Azure in the next couple months is workable
12:00 < stefanha> That's with small nodes similar to GitLab CI runners
12:00 < danpb> bonzini:  unless we're trying to get the pipeline
wallclock time shorter,  we don't need really beefy VMs - gitlabs
runners are quite low resources, we just use a lot in parallel
12:01 < bonzini> danpb: 10*16 vCPUs cost less than 80*2 vCPUs anyway
12:01 < stefanha> It seems the Azure quota will be fine
12:01 < stefanha> Hmm...actually I think I'm underestimating the
number of instances and their size.
12:01 < danpb> bonzini  i guess RAM is probably their dominating cost
factor for VMs rather than CPUs
12:02 < bonzini> danpb: a bit of both
12:03 < danpb> stefanha:  don't forget that our gitlab CI credits
don't reflect wallclock time - there's a 0.5 cost factor - so our
50,000 credits == 100,000 wallclock minutes per month
12:03 -!- Moot [~Moo99@185.247.84.132] has quit [Read error:
Connection reset by peer]
12:03 -!- bkircher [~bk@2001:a61:251f:7001:8aae:ddff:fe01:5bb2] has
quit [Remote host closed the connection]
12:03 < stefanha> With the current Azure quota QEMU could spend around
$500/month on Azure container service and nodes.
12:03 -!- bkircher [~bk@2001:a61:251f:7001:8aae:ddff:fe01:5bb2] has joined #qemu
12:04 < danpb> we burnt through 100,000 in about 2.5 weeks so would
need to allow for perhaps another 50,000 wallclock minutes at that
rate
12:04 < stefanha> danpb: I think it's still worth a shot with a
$500/month budget.
12:04 < bonzini> AWS Fargate has 60000 minutes * vCPU at 60 $/month
12:04 < danpb> yeah it does seems like its worth a try to use Azure
since we have the resources there going otherwise unused
12:04 < bonzini> Azure I think it was $1000/year
12:04 < bonzini> which is the same
12:04 -!- iggy [~iggy@47.152.10.131] has joined #qemu
12:05 < bonzini>     Average duration: 40 minutes = 0.67 hours
12:05 < bonzini> 1,500 tasks x 1 vCPU x 0.67 hours x 0.04048 USD per
hour = 40.68 USD for vCPU hours
12:05 < bonzini> 1,500 tasks x 4.00 GB x 0.67 hours x 0.004445 USD per
GB per hour = 17.87 USD for GB hours
12:05 < bonzini> 40.68 USD for vCPU hours + 17.87 USD for GB hours =
58.55 USD total
12:05 < stefanha>
https://makinhs.medium.com/azure-kubernetes-aks-gitlab-ci-a-short-guide-to-integrate-it-e62a4df5c86a
12:06 < bonzini> stefanha: let's ask if jeff nelson could have someone do it
12:06 < stefanha> bonzini: ok, do you want to ping him?
12:06 -!- Katje [freemadi@mail.quixotic.eu] has joined #qemu
12:06 < bonzini> yep
12:06 < stefanha> Thank you!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-27 17:43         ` Stefan Hajnoczi
@ 2023-03-01  4:51           ` Eldon Stegall
  2023-03-01  9:53             ` Alex Bennée
  2023-03-21 16:40           ` Daniel P. Berrangé
  1 sibling, 1 reply; 19+ messages in thread
From: Eldon Stegall @ 2023-03-01  4:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell,
	QEMU Developers

On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote:
> - Personally, I don't think this should exclude other efforts like
> Eldon's. We can always add more private runners!

Hi!
Thanks so much to Alex, Thomas, Gerd, et al for the pointers.

Although the month has passed and presumably gitlab credits have
replenished, I am interested in continuing my efforts to replicate the
shared runner capabilities. After some tinkering I was able to utilise
Gerd's stateless runner strategy with a few changes, and had a number of
tests pass in a pipeline on my repo:

https://gitlab.com/eldondev/qemu/-/pipelines/791573670

Looking at the failures, it seems that some may already be addressed in
patchsets, and some may be attributable to things like open file handle
count, which would be useful to configure directly on the d-in-d
runners, so I will investigate those after integrating the changes from
the past couple of days.

I have been reading through Alex's patchsets to lower CI time in the
hopes that I might be able to contribute something there from my
learnings on these pipelines. If there is an intent to switch to the
kubernetes gitlab executor, I have worked with kubernetes a number of
times in the past, and I can trial that as well.

Even with the possibility of turning on Azure and avoiding these monthly
crunches, maybe I can provide some help improving the turnaround time of
some of the jobs themselves, once I polish off greening the remaining
failures on my fork.

Forgive me if I knock around a bit here while I figure out how to be
useful.

Best,
Eldon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-03-01  4:51           ` Eldon Stegall
@ 2023-03-01  9:53             ` Alex Bennée
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2023-03-01  9:53 UTC (permalink / raw)
  To: Eldon Stegall
  Cc: Stefan Hajnoczi, Daniel P. Berrangé, Ben Dooks,
	Peter Maydell, qemu-devel


Eldon Stegall <eldon-qemu@eldondev.com> writes:

> On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote:
>> - Personally, I don't think this should exclude other efforts like
>> Eldon's. We can always add more private runners!
>
> Hi!
> Thanks so much to Alex, Thomas, Gerd, et al for the pointers.
>
> Although the month has passed and presumably gitlab credits have
> replenished, I am interested in continuing my efforts to replicate the
> shared runner capabilities. After some tinkering I was able to utilise
> Gerd's stateless runner strategy with a few changes, and had a number of
> tests pass in a pipeline on my repo:
>
> https://gitlab.com/eldondev/qemu/-/pipelines/791573670

Looking good. Eyeballing the run times they seem to be faster as well. I
assume the runner is less loaded than the shared gitlab ones?

> Looking at the failures, it seems that some may already be addressed in
> patchsets, and some may be attributable to things like open file handle
> count, which would be useful to configure directly on the d-in-d
> runners, so I will investigate those after integrating the changes from
> the past couple of days.
>
> I have been reading through Alex's patchsets to lower CI time in the
> hopes that I might be able to contribute something there from my
> learnings on these pipelines. If there is an intent to switch to the
> kubernetes gitlab executor, I have worked with kubernetes a number of
> times in the past, and I can trial that as well.

I've dropped that patch for now but I might revisit once the current
testing/next is done.

> Even with the possibility of turning on Azure and avoiding these monthly
> crunches, maybe I can provide some help improving the turnaround time of
> some of the jobs themselves, once I polish off greening the remaining
> failures on my fork.
>
> Forgive me if I knock around a bit here while I figure out how to be
> useful.

No problem, thanks for taking the time to look into it.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-27 17:43         ` Stefan Hajnoczi
  2023-03-01  4:51           ` Eldon Stegall
@ 2023-03-21 16:40           ` Daniel P. Berrangé
  2023-03-23  5:53             ` Eldon Stegall
  2023-03-23  9:18             ` Paolo Bonzini
  1 sibling, 2 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2023-03-21 16:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers

On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote:
> Here are IRC logs from a discussion that has taken place about this
> topic. Summary:
> - QEMU has ~$500/month Azure credits available that could be used for CI
> - Burstable VMs like Azure AKS nodes seem like a good strategy in
> order to minimize hosting costs and parallelize when necessary to keep
> CI duration low
> - Paolo is asking for someone from Red Hat to dedicate the time to set
> up Azure AKS with GitLab CI

3 weeks later... Any progress on getting Red Hat to assign someone to
setup Azure for our CI ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-03-21 16:40           ` Daniel P. Berrangé
@ 2023-03-23  5:53             ` Eldon Stegall
  2023-03-23  9:05               ` Alex Bennée
  2023-03-23  9:18             ` Paolo Bonzini
  1 sibling, 1 reply; 19+ messages in thread
From: Eldon Stegall @ 2023-03-23  5:53 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Stefan Hajnoczi, Ben Dooks, Peter Maydell, QEMU Developers

On Tue, Mar 21, 2023 at 04:40:03PM +0000, Daniel P. Berrangé wrote:
> 3 weeks later... Any progress on getting Red Hat to assign someone to
> setup Azure for our CI ?

I have the physical machine that we have offered to host for CI set up
with a recent version of fcos.

It isn't yet running a gitlab worker because I don't believe I have
access to create a gitlab worker token for the QEMU project. If creating
such a token is too much hassle, I could simple run the gitlab worker
against my fork in my gitlab account, and give full access to my repo to
the QEMU maintainers, so they could push to trigger jobs.

If you want someone to get the gitlab kubernetes operator set up in AKS,
I ended up getting a CKA cert a few years ago while working on an
operator. I could probably devote some time to get that going.

If any of this sounds appealing, let me know.

Thanks,
Eldon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-03-23  5:53             ` Eldon Stegall
@ 2023-03-23  9:05               ` Alex Bennée
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2023-03-23  9:05 UTC (permalink / raw)
  To: Eldon Stegall
  Cc: Daniel P. Berrangé, Stefan Hajnoczi, Ben Dooks,
	Peter Maydell, qemu-devel


Eldon Stegall <eldon-qemu@eldondev.com> writes:

> On Tue, Mar 21, 2023 at 04:40:03PM +0000, Daniel P. Berrangé wrote:
>> 3 weeks later... Any progress on getting Red Hat to assign someone to
>> setup Azure for our CI ?
>
> I have the physical machine that we have offered to host for CI set up
> with a recent version of fcos.
>
> It isn't yet running a gitlab worker because I don't believe I have
> access to create a gitlab worker token for the QEMU project.

Can you not see it under:

  https://gitlab.com/qemu-project/qemu/-/settings/ci_cd

If not I can share it with you via some other out-of-band means.

> If creating
> such a token is too much hassle, I could simple run the gitlab worker
> against my fork in my gitlab account, and give full access to my repo to
> the QEMU maintainers, so they could push to trigger jobs.
>
> If you want someone to get the gitlab kubernetes operator set up in AKS,
> I ended up getting a CKA cert a few years ago while working on an
> operator. I could probably devote some time to get that going.
>
> If any of this sounds appealing, let me know.
>
> Thanks,
> Eldon


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-03-21 16:40           ` Daniel P. Berrangé
  2023-03-23  5:53             ` Eldon Stegall
@ 2023-03-23  9:18             ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2023-03-23  9:18 UTC (permalink / raw)
  To: Daniel P. Berrangé, Stefan Hajnoczi, Camilla Conte
  Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers

On 3/21/23 17:40, Daniel P. Berrangé wrote:
> On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote:
>> Here are IRC logs from a discussion that has taken place about this
>> topic. Summary:
>> - QEMU has ~$500/month Azure credits available that could be used for CI
>> - Burstable VMs like Azure AKS nodes seem like a good strategy in
>> order to minimize hosting costs and parallelize when necessary to keep
>> CI duration low
>> - Paolo is asking for someone from Red Hat to dedicate the time to set
>> up Azure AKS with GitLab CI
> 
> 3 weeks later... Any progress on getting Red Hat to assign someone to
> setup Azure for our CI ?

Yes!  Camilla Conte has been working on it and documented her progress 
on https://wiki.qemu.org/Testing/CI/KubernetesRunners

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: out of CI pipeline minutes again
  2023-02-23 15:33   ` Daniel P. Berrangé
  2023-02-23 22:11     ` Eldon Stegall
@ 2023-02-24  9:54     ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2023-02-24  9:54 UTC (permalink / raw)
  To: Daniel P. Berrangé, Ben Dooks; +Cc: Peter Maydell, QEMU Developers

On 2/23/23 16:33, Daniel P. Berrangé wrote:
> On Thu, Feb 23, 2023 at 03:28:37PM +0000, Ben Dooks wrote:
>> On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote:
>>> Hi; the project is out of gitlab CI pipeline minutes again.
>>> In the absence of any other proposals, no more pull request
>>> merges will happen til 1st March...
>>
>> Is there a way of sponsoring more minutes, could people provide
>> runner resources to help?
> 
> IIUC, we already have available compute resources from a couple of
> sources we could put into service. The main issue is someone to
> actually configure them to act as runners *and* maintain their
> operation indefinitely going forward. The sysadmin problem is
> what made/makes gitlab's shared runners so incredibly appealing.

Indeed, that's the main issue.  Now that GNOME is hosting 
download.qemu.org, we have much more freedom about how to use the 
credits that we get from the Azure open source sponsorship program. 
Currently we only have 2 VMs running but we could even reduce that to 
just one.

Using the Kubernetes executor for GitLab would be both cheap and 
convenient because we would only pay (use sponsorship credits) when the 
CI is in progress.  Using beefy containers (e.g. 20*16 vCPUs) is 
therefore not out of question.  Unfortunately, this is not an easy thing 
to set up especially for people without much k8s experience.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-03-23  9:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell
2023-02-23 13:46 ` Thomas Huth
2023-02-23 14:14   ` Daniel P. Berrangé
2023-02-23 14:15   ` Warner Losh
2023-02-23 15:00     ` Daniel P. Berrangé
2023-02-23 15:28 ` Ben Dooks
2023-02-23 15:33   ` Daniel P. Berrangé
2023-02-23 22:11     ` Eldon Stegall
2023-02-24  9:16       ` Gerd Hoffmann
2023-02-24 14:07       ` Alex Bennée
2023-02-27 16:59       ` Daniel P. Berrangé
2023-02-27 17:43         ` Stefan Hajnoczi
2023-03-01  4:51           ` Eldon Stegall
2023-03-01  9:53             ` Alex Bennée
2023-03-21 16:40           ` Daniel P. Berrangé
2023-03-23  5:53             ` Eldon Stegall
2023-03-23  9:05               ` Alex Bennée
2023-03-23  9:18             ` Paolo Bonzini
2023-02-24  9:54     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).