* out of CI pipeline minutes again @ 2023-02-23 12:56 Peter Maydell 2023-02-23 13:46 ` Thomas Huth 2023-02-23 15:28 ` Ben Dooks 0 siblings, 2 replies; 19+ messages in thread From: Peter Maydell @ 2023-02-23 12:56 UTC (permalink / raw) To: QEMU Developers Hi; the project is out of gitlab CI pipeline minutes again. In the absence of any other proposals, no more pull request merges will happen til 1st March... -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell @ 2023-02-23 13:46 ` Thomas Huth 2023-02-23 14:14 ` Daniel P. Berrangé 2023-02-23 14:15 ` Warner Losh 2023-02-23 15:28 ` Ben Dooks 1 sibling, 2 replies; 19+ messages in thread From: Thomas Huth @ 2023-02-23 13:46 UTC (permalink / raw) To: Peter Maydell, QEMU Developers On 23/02/2023 13.56, Peter Maydell wrote: > Hi; the project is out of gitlab CI pipeline minutes again. > In the absence of any other proposals, no more pull request > merges will happen til 1st March... I'd like to propose again to send a link along with the pull request that shows that the shared runners are all green in the fork of the requester. You'd only need to check the custom runners in that case, which hopefully still work fine without CI minutes? It's definitely more cumbersome, but maybe better than queuing dozens of pull requests right in front of the soft freeze? Thomas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 13:46 ` Thomas Huth @ 2023-02-23 14:14 ` Daniel P. Berrangé 2023-02-23 14:15 ` Warner Losh 1 sibling, 0 replies; 19+ messages in thread From: Daniel P. Berrangé @ 2023-02-23 14:14 UTC (permalink / raw) To: Thomas Huth; +Cc: Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 02:46:40PM +0100, Thomas Huth wrote: > On 23/02/2023 13.56, Peter Maydell wrote: > > Hi; the project is out of gitlab CI pipeline minutes again. > > In the absence of any other proposals, no more pull request > > merges will happen til 1st March... > > I'd like to propose again to send a link along with the pull request that > shows that the shared runners are all green in the fork of the requester. > You'd only need to check the custom runners in that case, which hopefully > still work fine without CI minutes? The maintainer's fork will almost certainly not be against current HEAD though. So test results from them will not be equivalent to the tests that Peter normally does on staging, which reflects the result of merging current HEAD + the pull request. Sometimes that won't matter, but especially near freeze when we have a high volume of pull requests, I think that's an important difference to reduce risk of regressions. > It's definitely more cumbersome, but maybe better than queuing dozens of > pull requests right in front of the soft freeze? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 13:46 ` Thomas Huth 2023-02-23 14:14 ` Daniel P. Berrangé @ 2023-02-23 14:15 ` Warner Losh 2023-02-23 15:00 ` Daniel P. Berrangé 1 sibling, 1 reply; 19+ messages in thread From: Warner Losh @ 2023-02-23 14:15 UTC (permalink / raw) To: Thomas Huth; +Cc: Peter Maydell, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 896 bytes --] On Thu, Feb 23, 2023, 6:48 AM Thomas Huth <thuth@redhat.com> wrote: > On 23/02/2023 13.56, Peter Maydell wrote: > > Hi; the project is out of gitlab CI pipeline minutes again. > > In the absence of any other proposals, no more pull request > > merges will happen til 1st March... > > I'd like to propose again to send a link along with the pull request that > shows that the shared runners are all green in the fork of the requester. > You'd only need to check the custom runners in that case, which hopefully > still work fine without CI minutes? > > It's definitely more cumbersome, but maybe better than queuing dozens of > pull requests right in front of the soft freeze? > Yea. I'm just getting done with my pull request and it's really demotivating to be done early and miss the boat... I'm happy to do this because it's what I do anyway before sending a pull... Warner Thomas > > > [-- Attachment #2: Type: text/html, Size: 1591 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 14:15 ` Warner Losh @ 2023-02-23 15:00 ` Daniel P. Berrangé 0 siblings, 0 replies; 19+ messages in thread From: Daniel P. Berrangé @ 2023-02-23 15:00 UTC (permalink / raw) To: Warner Losh; +Cc: Thomas Huth, Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 07:15:47AM -0700, Warner Losh wrote: > On Thu, Feb 23, 2023, 6:48 AM Thomas Huth <thuth@redhat.com> wrote: > > > On 23/02/2023 13.56, Peter Maydell wrote: > > > Hi; the project is out of gitlab CI pipeline minutes again. > > > In the absence of any other proposals, no more pull request > > > merges will happen til 1st March... > > > > I'd like to propose again to send a link along with the pull request that > > shows that the shared runners are all green in the fork of the requester. > > You'd only need to check the custom runners in that case, which hopefully > > still work fine without CI minutes? > > > > It's definitely more cumbersome, but maybe better than queuing dozens of > > pull requests right in front of the soft freeze? > > > > Yea. I'm just getting done with my pull request and it's really > demotivating to be done early and miss the boat... Send your pull request anyway, so it is in the queue to be handled. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell 2023-02-23 13:46 ` Thomas Huth @ 2023-02-23 15:28 ` Ben Dooks 2023-02-23 15:33 ` Daniel P. Berrangé 1 sibling, 1 reply; 19+ messages in thread From: Ben Dooks @ 2023-02-23 15:28 UTC (permalink / raw) To: Peter Maydell; +Cc: QEMU Developers On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote: > Hi; the project is out of gitlab CI pipeline minutes again. > In the absence of any other proposals, no more pull request > merges will happen til 1st March... Is there a way of sponsoring more minutes, could people provide runner resources to help? -- Ben Dooks, ben@fluff.org, http://www.fluff.org/ben/ Large Hadron Colada: A large Pina Colada that makes the universe disappear. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 15:28 ` Ben Dooks @ 2023-02-23 15:33 ` Daniel P. Berrangé 2023-02-23 22:11 ` Eldon Stegall 2023-02-24 9:54 ` Paolo Bonzini 0 siblings, 2 replies; 19+ messages in thread From: Daniel P. Berrangé @ 2023-02-23 15:33 UTC (permalink / raw) To: Ben Dooks; +Cc: Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 03:28:37PM +0000, Ben Dooks wrote: > On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote: > > Hi; the project is out of gitlab CI pipeline minutes again. > > In the absence of any other proposals, no more pull request > > merges will happen til 1st March... > > Is there a way of sponsoring more minutes, could people provide > runner resources to help? IIUC, we already have available compute resources from a couple of sources we could put into service. The main issue is someone to actually configure them to act as runners *and* maintain their operation indefinitely going forward. The sysadmin problem is what made/makes gitlab's shared runners so incredibly appealing. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 15:33 ` Daniel P. Berrangé @ 2023-02-23 22:11 ` Eldon Stegall 2023-02-24 9:16 ` Gerd Hoffmann ` (2 more replies) 2023-02-24 9:54 ` Paolo Bonzini 1 sibling, 3 replies; 19+ messages in thread From: Eldon Stegall @ 2023-02-23 22:11 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Ben Dooks, Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote: > IIUC, we already have available compute resources from a couple of > sources we could put into service. The main issue is someone to > actually configure them to act as runners *and* maintain their > operation indefinitely going forward. The sysadmin problem is > what made/makes gitlab's shared runners so incredibly appealing. Hello, I would like to do this, but the path to contribute in this way isn't clear to me at this moment. I made it as far as creating a GitLab fork of QEMU, and then attempting to create a merge request from my branch in order to test the GitLab runner I have provisioned. Not having previously tried to contribute via GitLab, I was a bit stymied that it is not even possibly to create a merge request unless I am a member of the project? I clicked a button to request access. Alex's plan from last month sounds feasible: - provisioning scripts in scripts/ci/setup (if existing not already good enough) - tweak to handle multiple runner instances (or more -j on the build) - changes to .gitlab-ci.d/ so we can use those machines while keeping ability to run on shared runners for those outside the project Daniel, you pointed out the importance of reproducibility, and thus the use of the two-step process, build-docker, and then test-in-docker, so it seems that only docker and the gitlab agent would be strong requirements for running the jobs? I feel like the greatest win for this would be to at least host the cirrus-run jobs on a dedicated runner because the machine seems to simply be burning double minutes until the cirrus job is complete, so I would expect the GitLab runner requirements for those jobs to be low? If there are some other steps that I should take to contribute in this capacity, please let me know. Maybe I could send a patch to tag cirrus jobs in the same way that the s390x jobs are currently tagged, so that we could run those separately? Thanks, Eldon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 22:11 ` Eldon Stegall @ 2023-02-24 9:16 ` Gerd Hoffmann 2023-02-24 14:07 ` Alex Bennée 2023-02-27 16:59 ` Daniel P. Berrangé 2 siblings, 0 replies; 19+ messages in thread From: Gerd Hoffmann @ 2023-02-24 9:16 UTC (permalink / raw) To: Eldon Stegall Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 10:11:11PM +0000, Eldon Stegall wrote: > On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote: > > IIUC, we already have available compute resources from a couple of > > sources we could put into service. The main issue is someone to > > actually configure them to act as runners *and* maintain their > > operation indefinitely going forward. The sysadmin problem is > > what made/makes gitlab's shared runners so incredibly appealing. I have a gitlab runner active on a hosted machine. It builds on fedora coreos and doesn't need much baby-sitting. Just copied the bits into a new repository and pushed to https://gitlab.com/kraxel/coreos-gitlab-runner > Daniel, you pointed out the importance of reproducibility, and thus the > use of the two-step process, build-docker, and then test-in-docker, so it > seems that only docker and the gitlab agent would be strong requirements for > running the jobs? The above works just fine as replacement for the shared runners. Can also run in parallel to the shared runners, but it's slower on picking up jobs, so it'll effectively take over when you ran out of minutes. take care, Gerd ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 22:11 ` Eldon Stegall 2023-02-24 9:16 ` Gerd Hoffmann @ 2023-02-24 14:07 ` Alex Bennée 2023-02-27 16:59 ` Daniel P. Berrangé 2 siblings, 0 replies; 19+ messages in thread From: Alex Bennée @ 2023-02-24 14:07 UTC (permalink / raw) To: Eldon Stegall Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell, qemu-devel Eldon Stegall <eldon-qemu@eldondev.com> writes: > On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote: >> IIUC, we already have available compute resources from a couple of >> sources we could put into service. The main issue is someone to >> actually configure them to act as runners *and* maintain their >> operation indefinitely going forward. The sysadmin problem is >> what made/makes gitlab's shared runners so incredibly appealing. > > Hello, > > I would like to do this, but the path to contribute in this way isn't clear to > me at this moment. I made it as far as creating a GitLab fork of QEMU, and then > attempting to create a merge request from my branch in order to test the GitLab > runner I have provisioned. Not having previously tried to contribute via > GitLab, I was a bit stymied that it is not even possibly to create a merge > request unless I am a member of the project? I clicked a button to request > access. We don't process merge requests and shouldn't need them to run CI. By default a pushed branch won't trigger testing so we document a way to tweak your GIT config to set the QEMU_CI environment so: git push-ci-now -f gitlab will trigger the testing. See: https://qemu.readthedocs.io/en/latest/devel/ci.html#custom-ci-cd-variables > > Alex's plan from last month sounds feasible: > > - provisioning scripts in scripts/ci/setup (if existing not already > good enough) > - tweak to handle multiple runner instances (or more -j on the build) > - changes to .gitlab-ci.d/ so we can use those machines while keeping > ability to run on shared runners for those outside the project > > Daniel, you pointed out the importance of reproducibility, and thus the > use of the two-step process, build-docker, and then test-in-docker, so it > seems that only docker and the gitlab agent would be strong requirements for > running the jobs? Yeah the current provisioning scripts install packages to the host. We'd like to avoid that and use the runner inside our docker images rather than polluting the host with setup. Although in practice some hosts pull double duty and developers want to be able to replicate the setup when chasing CI errors so will likely install the packages anyway. > > I feel like the greatest win for this would be to at least host the > cirrus-run jobs on a dedicated runner because the machine seems to > simply be burning double minutes until the cirrus job is complete, so I > would expect the GitLab runner requirements for those jobs to be low? > > If there are some other steps that I should take to contribute in this > capacity, please let me know. > > Maybe I could send a patch to tag cirrus jobs in the same way that the > s390x jobs are currently tagged, so that we could run those separately? > > Thanks, > Eldon -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 22:11 ` Eldon Stegall 2023-02-24 9:16 ` Gerd Hoffmann 2023-02-24 14:07 ` Alex Bennée @ 2023-02-27 16:59 ` Daniel P. Berrangé 2023-02-27 17:43 ` Stefan Hajnoczi 2 siblings, 1 reply; 19+ messages in thread From: Daniel P. Berrangé @ 2023-02-27 16:59 UTC (permalink / raw) To: Eldon Stegall; +Cc: Ben Dooks, Peter Maydell, QEMU Developers On Thu, Feb 23, 2023 at 10:11:11PM +0000, Eldon Stegall wrote: > On Thu, Feb 23, 2023 at 03:33:00PM +0000, Daniel P. Berrangé wrote: > > IIUC, we already have available compute resources from a couple of > > sources we could put into service. The main issue is someone to > > actually configure them to act as runners *and* maintain their > > operation indefinitely going forward. The sysadmin problem is > > what made/makes gitlab's shared runners so incredibly appealing. > > Hello, > > I would like to do this, but the path to contribute in this way isn't clear to > me at this moment. I made it as far as creating a GitLab fork of QEMU, and then > attempting to create a merge request from my branch in order to test the GitLab > runner I have provisioned. Not having previously tried to contribute via > GitLab, I was a bit stymied that it is not even possibly to create a merge > request unless I am a member of the project? I clicked a button to request > access. > > Alex's plan from last month sounds feasible: > > - provisioning scripts in scripts/ci/setup (if existing not already > good enough) > - tweak to handle multiple runner instances (or more -j on the build) > - changes to .gitlab-ci.d/ so we can use those machines while keeping > ability to run on shared runners for those outside the project > > Daniel, you pointed out the importance of reproducibility, and thus the > use of the two-step process, build-docker, and then test-in-docker, so it > seems that only docker and the gitlab agent would be strong requirements for > running the jobs? Almost our entire CI setup is built around use of docker and I don't believe we really want to change that. Even ignoring GitLab, pretty much all public CI services support use of docker containers for the CI environment, so that's a defacto standard. So while git gitlab runner agent can support many different execution environments, I don't think we want to consider any except for the ones that support containers (and that would need docker-in-docker to be enabled too). Essentially we'll be using GitLab free CI credits for most of the month. What we need is some extra private CI resource that can pick up the the slack when we run out of free CI credits each month. Thus the private CI resource needs to be compatible with the public shared runners, by providing the same docker based environment[1]. It is a great shame that our current private runners ansible playbooks were not configuring thue system for use with docker, as that would have got us 90% of the way there already. One thing to bear in mind is that a typical QEMU pipeline has 130 jobs running. Each gitlab shared runner is 1 vCPU, 3.75 GB of RAM, and we're using as many as 60-70 of such instances at a time. A single physical machine probably won't cope unless it is very big. To avoid making the overall pipeline wallclock time too long, we need to be able to handle a large number of parallel jobs at certain times. We're quite peaky in our usage. Some days we merge nothing and so consume no CI. Some days we may merge many PRs and so consumes lots of CI. So buying lots of VMs to run 24x7 is quite wasteful. A burstab;le container service is quite appealing IIUC, GitLab's shared runners use GCP's "spot" instances which are cheaper than regular instances. The downside is that the VM can get killed/descheduled if something higher priority needs Google's resources. Not too nice for reliabilty, but excellant for cost saving. With regards, Daniel [1] there are still several ways to achieve this. A bare metal machine with a local install of docker, or podman, vs pointing to a public k8s instance that can run containers, and possibly other options too. -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-27 16:59 ` Daniel P. Berrangé @ 2023-02-27 17:43 ` Stefan Hajnoczi 2023-03-01 4:51 ` Eldon Stegall 2023-03-21 16:40 ` Daniel P. Berrangé 0 siblings, 2 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2023-02-27 17:43 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers Here are IRC logs from a discussion that has taken place about this topic. Summary: - QEMU has ~$500/month Azure credits available that could be used for CI - Burstable VMs like Azure AKS nodes seem like a good strategy in order to minimize hosting costs and parallelize when necessary to keep CI duration low - Paolo is asking for someone from Red Hat to dedicate the time to set up Azure AKS with GitLab CI - Personally, I don't think this should exclude other efforts like Eldon's. We can always add more private runners! 11:31 < pm215> does anybody who understands how the CI stuff works have the time to take on the task of getting this all to work with either (a) a custom runner running on one of the hosts we've been offered or (b) whatever it needs to run using our donated azure credits ? 11:34 < danpb> i would love to, but i can't volunteer my time right now :-( 11:34 < stefanha> Here is the email thread for anyone else who's trying to catch up (like me): https://lore.kernel.org/qemu-devel/CAFEAcA83u_ENxDj3GJKa-xv6eLJGJPr_9FRDKAqm3qACyhrTgg@mail.gmail.com/ 11:34 -!- iggy [~iggy@47.152.10.131] has quit [Quit: WeeChat 3.5] 11:35 -!- peterx_ is now known as peterx 11:35 < danpb> what paolo suggested about using the Kubernetes runners for Azure seems like the ideal approach 11:35 -!- peterx [~xz@bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca] has quit [Quit: WeeChat 3.6] 11:35 -!- peterx [~xz@bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca] has joined #qemu 11:36 < danpb> as it would be most cost effective in terms of Azure resources consumed, and would scale well as it would support as many parallel runners as we can afford from our Azure allowance 11:36 < danpb> but its also probably ythe more complex option to setup 11:36 -!- dramforever [~dramforev@59.66.131.33] has quit [Ping timeout: 480 seconds] 11:37 < stefanha> It's a little sad to see the people who are volunteering to help being ignored in the email thread. 11:37 < stefanha> Ben Dooks asked about donating minutes (the easiest solution). 11:38 -!- sgarzare [~sgarzare@c-115-213.cust-q.wadsl.it] has quit [Remote host closed the connection] 11:38 < peterx> jsnow[m]: an AHCI question: where normally does the address in ahci_map_fis_address() (aka, AHCIPortRegs.is_addr[_hi]) reside? Should that be part of guest RAM that the guest AHCI driver maps? 11:38 < stefanha> eldondev is in the process of setting up a runner. 11:38 < pm215> stefanha: the problem is there is no one person who has all of (a) the authority to do stuff (b) the knowledge of what the right thing to do is (c) the time to do it... 11:38 < danpb> stefanha: the challenge is how to accept the donation in a sustainable way 11:39 < th_huth> stefanha: Do you know whether we still got that fosshost server around? ... I know, fosshost is going away, but maybe we could use it as a temporary solution at least 11:39 < stefanha> th_huth: fosshost, the organization, has ceased to operate. 11:39 < danpb> fosshost ceased operation as a concept 11:39 < davidgiluk> I wonder if there's a way to make it so that those of us with hardware can avoid eating into the central CI count 11:39 < danpb> if we still have any access to a machine its just by luck 11:39 < stefanha> th_huth: QEMU has 2 smallish x86 VMs ready to at Oregon State University Open Source Lab. 11:40 < stefanha> (2 vCPUs and 4 GB RAM, probably not enough for private runners) 11:40 -!- amorenoz [~amorenoz@139.47.72.25] has quit [Read error: Connection reset by peer] 11:41 < peterx> jsnow[m]: the context is that someone is optimizing migration by postponing all memory updates to after some point, and the AHCI post_load is not happy because it cannot find/map the FIS address here due to delayed memory commit() (https://pastebin.com/kADnTKzp). However when I look at that I failed to see why if it's in normal RAM (but I think I had somewhere wrong) 11:41 < danpb> stefanha: fyi gitlab's shared runners are 1 vCPU, 3.75 GB of RAM by default 11:41 < stefanha> Gerd Hoffmann seems to have a hands-off/stateless private runner setup that can be scaled to multiple machines. 11:41 < stefanha> danpb: :) 11:41 < danpb> stefanha: so those two small VMs are equivalent to 2 runners, and with 30-40 jobs we run in parallel 11:41 < stefanha> The first thing that needs to be decided is which approach to take: 11:41 < jsnow[m]> peterx: I'm rusty but I think so. the guest writes an FIS (Frame Information Structure) for the card to read and operate on 11:41 < danpb> stefanha: just having two runners is going to make our pipelines take x20 longer to finish 11:41 < stefanha> 1. Donating more minutes to GitLab CI 11:42 < stefanha> 2. Using Azure to spawn runners 11:42 < stsquad> stefanha I think the main bottleneck is commissioning and admin'ing machines - but we have ansible playbooks to do it for our other custom runners so it should "just" be a case of writing one for an x86 runner 11:42 < stefanha> 3. Hosting runners on servers 11:42 < danpb> that's why i think the Azure k8s executor sounded promising - it would burst upto 20-30 jobs in parallel for the short time we run CI 11:42 < danpb> without us having to pay for 30 vms 24x7 11:42 < stsquad> who actually understands configuring k8s? 11:42 < th_huth> stefanha: I read that fosshost announcement that they will be going away ... not that they have already terminated everything ... but sure, it's not something sustainable 11:42 < stefanha> stsquad: Yep, kraxel's approach solves that because it's stateless/automated. 11:43 < peterx> jsnow[m]: thanks, then let me double check 11:43 < jsnow[m]> peterx: iirc the guest writes the address of the FIS to a register and then the pci card maps that address to read the larger command structure 11:43 < danpb> stsquad (@_oftc_stsquad:matrix.org): i don't think we'd need to configure k8s itself, just figure out how to point gitlab runner as the azure k8s service 11:44 < stefanha> danpb: Has someone calculated the cost needed to run QEMU CI on Azure? It's great that we can burst it when needed, but will the free Azure quota be enough? 11:44 -!- mmu_man [~revol@188410969.box.freepro.com] has joined #qemu 11:44 < danpb> stsquad: the problem with our current ansible playbooks is that none of them used docker AFAIR, they just setup the gitlab runer as bare metal 11:44 < peterx> jsnow[m]: yes, the thing is if that's the case the RAM should have been there when post_load() even without commit anything, so maybe there's something else I missed 11:44 < stefanha> i.e. will we just hit a funding wall again but on Azure instead of on GitLab? 11:44 < danpb> stefanha don't think anyone's calculated it, would hafve to ask bonzini what we actually get access to 11:45 < danpb> what would help is that we would not need azure for the whole month 11:45 < danpb> we would onl need it to fill in the gap when gitlab allowance is consumed 11:45 < stsquad> danpb I'm sure they can be re-written - I can't recall what stopped us using docker in the first place 11:46 < stsquad> but I'm a little wary of experimenting on the live CI server 11:46 < danpb> they wanted to run avocado tests which utilize some bare metal features 11:46 < stsquad> ahh that would be it 11:46 < stsquad> access to /dev/kvm 11:46 < danpb> i suggested that we set it up to expose KVM etc to the container but it wasn't done that way :-( 11:49 < stefanha> danpb: A simple estimate would be: "QEMU uses 50k CI minutes around the 20th of each month, so thats 50/20 * 10 more days = 25k CI minutes needed to cover those last 10 days" 11:49 < stefanha> Assuming GitLab CI minutes are equivalent to Azure k8s minutes 11:50 < stefanha> and then multiply 25k minutes by the Azure instance price rate. 11:51 < stefanha> ISTR the Azure quota is manually renewed by bonzini[m]. It may have been something like $10k and we use $2k of it for non-CI stuff at the moment. 11:52 < stefanha> I'm not sure if the $10k is renewed annually or semi-annually. 11:52 -!- genpaku_ [~genpaku@107.191.100.185] has quit [Read error: Connection reset by peer] 11:52 < stefanha> So maybe $8k available per year. 11:52 < dwmw2_gone> I feel I ought to be able to round up some VM instances too. 11:53 -!- farosas [~farosas@177.103.113.244] has quit [Quit: Leaving] 11:53 -!- farosas [~farosas@177.103.113.244] has joined #qemu 11:54 < bonzini> stefanha: right, more like $3k to be safe 11:54 < bonzini> dwmw2_gone: the right thing to do would be to set up kubernetes/Fargate 11:54 < bonzini> same for Azure 11:54 -!- zzhu [~zzhu@072-182-049-214.res.spectrum.com] has quit [Remote host closed the connection] 11:55 < bonzini> dwmw2_gone: because what we really need is beefy VMs (let's say 10*16 vCPU) for a few hours a week, not something 24/7 11:55 < bonzini> the Azure and AWS estimators both gave ~1000$/year 11:56 -!- genpaku [~genpaku@107.191.100.185] has joined #qemu 11:57 < dwmw2_gone> I have "build scripts" which launch an instance, do the build there, terminate it. Why would you need anything 24/7? :) 11:57 < dwmw2_gone> I abuse some of our test harnesses for builds 11:57 < dwmw2_gone> You can have bare metal that way, and actually get KVM. 11:58 < bonzini> dwmw2_gone: 24/7 because that's what the gitlab runners want (unless you put them on kubernetes) 11:59 -!- vliaskov [~vliaskov@dynamic-077-191-055-225.77.191.pool.telefonica.de] has quit [Remote host closed the connection] 11:59 < dwmw2_gone> Ah. Unless the gitlab runners just spawned the instance to do the test, and waited for it. They don't use many CPU minutse that way. 11:59 -!- bolt [~r00t@000182e9.user.oftc.net] has quit [Ping timeout: 480 seconds] 12:00 < stefanha> 25k mins / 60 minutes/hour = 417 hours/month @ AKS node hourly price $0.077 = $32 month (!) 12:00 < bonzini> stefanha: danpb: i think spending 250-500 $ on GitLab CI while we set up Azure in the next couple months is workable 12:00 < stefanha> That's with small nodes similar to GitLab CI runners 12:00 < danpb> bonzini: unless we're trying to get the pipeline wallclock time shorter, we don't need really beefy VMs - gitlabs runners are quite low resources, we just use a lot in parallel 12:01 < bonzini> danpb: 10*16 vCPUs cost less than 80*2 vCPUs anyway 12:01 < stefanha> It seems the Azure quota will be fine 12:01 < stefanha> Hmm...actually I think I'm underestimating the number of instances and their size. 12:01 < danpb> bonzini i guess RAM is probably their dominating cost factor for VMs rather than CPUs 12:02 < bonzini> danpb: a bit of both 12:03 < danpb> stefanha: don't forget that our gitlab CI credits don't reflect wallclock time - there's a 0.5 cost factor - so our 50,000 credits == 100,000 wallclock minutes per month 12:03 -!- Moot [~Moo99@185.247.84.132] has quit [Read error: Connection reset by peer] 12:03 -!- bkircher [~bk@2001:a61:251f:7001:8aae:ddff:fe01:5bb2] has quit [Remote host closed the connection] 12:03 < stefanha> With the current Azure quota QEMU could spend around $500/month on Azure container service and nodes. 12:03 -!- bkircher [~bk@2001:a61:251f:7001:8aae:ddff:fe01:5bb2] has joined #qemu 12:04 < danpb> we burnt through 100,000 in about 2.5 weeks so would need to allow for perhaps another 50,000 wallclock minutes at that rate 12:04 < stefanha> danpb: I think it's still worth a shot with a $500/month budget. 12:04 < bonzini> AWS Fargate has 60000 minutes * vCPU at 60 $/month 12:04 < danpb> yeah it does seems like its worth a try to use Azure since we have the resources there going otherwise unused 12:04 < bonzini> Azure I think it was $1000/year 12:04 < bonzini> which is the same 12:04 -!- iggy [~iggy@47.152.10.131] has joined #qemu 12:05 < bonzini> Average duration: 40 minutes = 0.67 hours 12:05 < bonzini> 1,500 tasks x 1 vCPU x 0.67 hours x 0.04048 USD per hour = 40.68 USD for vCPU hours 12:05 < bonzini> 1,500 tasks x 4.00 GB x 0.67 hours x 0.004445 USD per GB per hour = 17.87 USD for GB hours 12:05 < bonzini> 40.68 USD for vCPU hours + 17.87 USD for GB hours = 58.55 USD total 12:05 < stefanha> https://makinhs.medium.com/azure-kubernetes-aks-gitlab-ci-a-short-guide-to-integrate-it-e62a4df5c86a 12:06 < bonzini> stefanha: let's ask if jeff nelson could have someone do it 12:06 < stefanha> bonzini: ok, do you want to ping him? 12:06 -!- Katje [freemadi@mail.quixotic.eu] has joined #qemu 12:06 < bonzini> yep 12:06 < stefanha> Thank you! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-27 17:43 ` Stefan Hajnoczi @ 2023-03-01 4:51 ` Eldon Stegall 2023-03-01 9:53 ` Alex Bennée 2023-03-21 16:40 ` Daniel P. Berrangé 1 sibling, 1 reply; 19+ messages in thread From: Eldon Stegall @ 2023-03-01 4:51 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Daniel P. Berrangé, Ben Dooks, Peter Maydell, QEMU Developers On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote: > - Personally, I don't think this should exclude other efforts like > Eldon's. We can always add more private runners! Hi! Thanks so much to Alex, Thomas, Gerd, et al for the pointers. Although the month has passed and presumably gitlab credits have replenished, I am interested in continuing my efforts to replicate the shared runner capabilities. After some tinkering I was able to utilise Gerd's stateless runner strategy with a few changes, and had a number of tests pass in a pipeline on my repo: https://gitlab.com/eldondev/qemu/-/pipelines/791573670 Looking at the failures, it seems that some may already be addressed in patchsets, and some may be attributable to things like open file handle count, which would be useful to configure directly on the d-in-d runners, so I will investigate those after integrating the changes from the past couple of days. I have been reading through Alex's patchsets to lower CI time in the hopes that I might be able to contribute something there from my learnings on these pipelines. If there is an intent to switch to the kubernetes gitlab executor, I have worked with kubernetes a number of times in the past, and I can trial that as well. Even with the possibility of turning on Azure and avoiding these monthly crunches, maybe I can provide some help improving the turnaround time of some of the jobs themselves, once I polish off greening the remaining failures on my fork. Forgive me if I knock around a bit here while I figure out how to be useful. Best, Eldon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-03-01 4:51 ` Eldon Stegall @ 2023-03-01 9:53 ` Alex Bennée 0 siblings, 0 replies; 19+ messages in thread From: Alex Bennée @ 2023-03-01 9:53 UTC (permalink / raw) To: Eldon Stegall Cc: Stefan Hajnoczi, Daniel P. Berrangé, Ben Dooks, Peter Maydell, qemu-devel Eldon Stegall <eldon-qemu@eldondev.com> writes: > On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote: >> - Personally, I don't think this should exclude other efforts like >> Eldon's. We can always add more private runners! > > Hi! > Thanks so much to Alex, Thomas, Gerd, et al for the pointers. > > Although the month has passed and presumably gitlab credits have > replenished, I am interested in continuing my efforts to replicate the > shared runner capabilities. After some tinkering I was able to utilise > Gerd's stateless runner strategy with a few changes, and had a number of > tests pass in a pipeline on my repo: > > https://gitlab.com/eldondev/qemu/-/pipelines/791573670 Looking good. Eyeballing the run times they seem to be faster as well. I assume the runner is less loaded than the shared gitlab ones? > Looking at the failures, it seems that some may already be addressed in > patchsets, and some may be attributable to things like open file handle > count, which would be useful to configure directly on the d-in-d > runners, so I will investigate those after integrating the changes from > the past couple of days. > > I have been reading through Alex's patchsets to lower CI time in the > hopes that I might be able to contribute something there from my > learnings on these pipelines. If there is an intent to switch to the > kubernetes gitlab executor, I have worked with kubernetes a number of > times in the past, and I can trial that as well. I've dropped that patch for now but I might revisit once the current testing/next is done. > Even with the possibility of turning on Azure and avoiding these monthly > crunches, maybe I can provide some help improving the turnaround time of > some of the jobs themselves, once I polish off greening the remaining > failures on my fork. > > Forgive me if I knock around a bit here while I figure out how to be > useful. No problem, thanks for taking the time to look into it. -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-27 17:43 ` Stefan Hajnoczi 2023-03-01 4:51 ` Eldon Stegall @ 2023-03-21 16:40 ` Daniel P. Berrangé 2023-03-23 5:53 ` Eldon Stegall 2023-03-23 9:18 ` Paolo Bonzini 1 sibling, 2 replies; 19+ messages in thread From: Daniel P. Berrangé @ 2023-03-21 16:40 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote: > Here are IRC logs from a discussion that has taken place about this > topic. Summary: > - QEMU has ~$500/month Azure credits available that could be used for CI > - Burstable VMs like Azure AKS nodes seem like a good strategy in > order to minimize hosting costs and parallelize when necessary to keep > CI duration low > - Paolo is asking for someone from Red Hat to dedicate the time to set > up Azure AKS with GitLab CI 3 weeks later... Any progress on getting Red Hat to assign someone to setup Azure for our CI ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-03-21 16:40 ` Daniel P. Berrangé @ 2023-03-23 5:53 ` Eldon Stegall 2023-03-23 9:05 ` Alex Bennée 2023-03-23 9:18 ` Paolo Bonzini 1 sibling, 1 reply; 19+ messages in thread From: Eldon Stegall @ 2023-03-23 5:53 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Stefan Hajnoczi, Ben Dooks, Peter Maydell, QEMU Developers On Tue, Mar 21, 2023 at 04:40:03PM +0000, Daniel P. Berrangé wrote: > 3 weeks later... Any progress on getting Red Hat to assign someone to > setup Azure for our CI ? I have the physical machine that we have offered to host for CI set up with a recent version of fcos. It isn't yet running a gitlab worker because I don't believe I have access to create a gitlab worker token for the QEMU project. If creating such a token is too much hassle, I could simple run the gitlab worker against my fork in my gitlab account, and give full access to my repo to the QEMU maintainers, so they could push to trigger jobs. If you want someone to get the gitlab kubernetes operator set up in AKS, I ended up getting a CKA cert a few years ago while working on an operator. I could probably devote some time to get that going. If any of this sounds appealing, let me know. Thanks, Eldon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-03-23 5:53 ` Eldon Stegall @ 2023-03-23 9:05 ` Alex Bennée 0 siblings, 0 replies; 19+ messages in thread From: Alex Bennée @ 2023-03-23 9:05 UTC (permalink / raw) To: Eldon Stegall Cc: Daniel P. Berrangé, Stefan Hajnoczi, Ben Dooks, Peter Maydell, qemu-devel Eldon Stegall <eldon-qemu@eldondev.com> writes: > On Tue, Mar 21, 2023 at 04:40:03PM +0000, Daniel P. Berrangé wrote: >> 3 weeks later... Any progress on getting Red Hat to assign someone to >> setup Azure for our CI ? > > I have the physical machine that we have offered to host for CI set up > with a recent version of fcos. > > It isn't yet running a gitlab worker because I don't believe I have > access to create a gitlab worker token for the QEMU project. Can you not see it under: https://gitlab.com/qemu-project/qemu/-/settings/ci_cd If not I can share it with you via some other out-of-band means. > If creating > such a token is too much hassle, I could simple run the gitlab worker > against my fork in my gitlab account, and give full access to my repo to > the QEMU maintainers, so they could push to trigger jobs. > > If you want someone to get the gitlab kubernetes operator set up in AKS, > I ended up getting a CKA cert a few years ago while working on an > operator. I could probably devote some time to get that going. > > If any of this sounds appealing, let me know. > > Thanks, > Eldon -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-03-21 16:40 ` Daniel P. Berrangé 2023-03-23 5:53 ` Eldon Stegall @ 2023-03-23 9:18 ` Paolo Bonzini 1 sibling, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2023-03-23 9:18 UTC (permalink / raw) To: Daniel P. Berrangé, Stefan Hajnoczi, Camilla Conte Cc: Eldon Stegall, Ben Dooks, Peter Maydell, QEMU Developers On 3/21/23 17:40, Daniel P. Berrangé wrote: > On Mon, Feb 27, 2023 at 12:43:55PM -0500, Stefan Hajnoczi wrote: >> Here are IRC logs from a discussion that has taken place about this >> topic. Summary: >> - QEMU has ~$500/month Azure credits available that could be used for CI >> - Burstable VMs like Azure AKS nodes seem like a good strategy in >> order to minimize hosting costs and parallelize when necessary to keep >> CI duration low >> - Paolo is asking for someone from Red Hat to dedicate the time to set >> up Azure AKS with GitLab CI > > 3 weeks later... Any progress on getting Red Hat to assign someone to > setup Azure for our CI ? Yes! Camilla Conte has been working on it and documented her progress on https://wiki.qemu.org/Testing/CI/KubernetesRunners Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: out of CI pipeline minutes again 2023-02-23 15:33 ` Daniel P. Berrangé 2023-02-23 22:11 ` Eldon Stegall @ 2023-02-24 9:54 ` Paolo Bonzini 1 sibling, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2023-02-24 9:54 UTC (permalink / raw) To: Daniel P. Berrangé, Ben Dooks; +Cc: Peter Maydell, QEMU Developers On 2/23/23 16:33, Daniel P. Berrangé wrote: > On Thu, Feb 23, 2023 at 03:28:37PM +0000, Ben Dooks wrote: >> On Thu, Feb 23, 2023 at 12:56:56PM +0000, Peter Maydell wrote: >>> Hi; the project is out of gitlab CI pipeline minutes again. >>> In the absence of any other proposals, no more pull request >>> merges will happen til 1st March... >> >> Is there a way of sponsoring more minutes, could people provide >> runner resources to help? > > IIUC, we already have available compute resources from a couple of > sources we could put into service. The main issue is someone to > actually configure them to act as runners *and* maintain their > operation indefinitely going forward. The sysadmin problem is > what made/makes gitlab's shared runners so incredibly appealing. Indeed, that's the main issue. Now that GNOME is hosting download.qemu.org, we have much more freedom about how to use the credits that we get from the Azure open source sponsorship program. Currently we only have 2 VMs running but we could even reduce that to just one. Using the Kubernetes executor for GitLab would be both cheap and convenient because we would only pay (use sponsorship credits) when the CI is in progress. Using beefy containers (e.g. 20*16 vCPUs) is therefore not out of question. Unfortunately, this is not an easy thing to set up especially for people without much k8s experience. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-03-23 9:18 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-23 12:56 out of CI pipeline minutes again Peter Maydell 2023-02-23 13:46 ` Thomas Huth 2023-02-23 14:14 ` Daniel P. Berrangé 2023-02-23 14:15 ` Warner Losh 2023-02-23 15:00 ` Daniel P. Berrangé 2023-02-23 15:28 ` Ben Dooks 2023-02-23 15:33 ` Daniel P. Berrangé 2023-02-23 22:11 ` Eldon Stegall 2023-02-24 9:16 ` Gerd Hoffmann 2023-02-24 14:07 ` Alex Bennée 2023-02-27 16:59 ` Daniel P. Berrangé 2023-02-27 17:43 ` Stefan Hajnoczi 2023-03-01 4:51 ` Eldon Stegall 2023-03-01 9:53 ` Alex Bennée 2023-03-21 16:40 ` Daniel P. Berrangé 2023-03-23 5:53 ` Eldon Stegall 2023-03-23 9:05 ` Alex Bennée 2023-03-23 9:18 ` Paolo Bonzini 2023-02-24 9:54 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).