* Azure infrastructure update @ 2023-06-28 10:44 Paolo Bonzini 2023-06-28 11:28 ` Daniel P. Berrangé 0 siblings, 1 reply; 4+ messages in thread From: Paolo Bonzini @ 2023-06-28 10:44 UTC (permalink / raw) To: qemu Cc: qemu-devel, Camilla Conte, Richard Henderson, P. Berrange, Daniel, Thomas Huth, Armbruster, Markus Hi all, a small update on the infrastructure we have set up on Azure and the expected costs. Remember that we have $10000/year credits from the Microsoft open source program, therefore the actual cost to the project is zero unless we exceed the threshold. Historically, QEMU's infrastructure was hosted on virtual machines sponsored by Rackspace's open source infrastructure program. When the program was abruptly terminated, QEMU faced a cost of roughly $1500/month, mostly due to bandwidth. As an initial step to cut these costs, downloads were moved to Azure. However, bandwidth costs remained high and in 2022 we exceeded the credits from the sponsorship and we had to pay roughly $4000 to Microsoft, in addition to roughly $2000 for VMs that were still hosted on Rackspace. While not a definitive solution, this saved the project an expense of over $10000. Fortunately, the GNOME project stepped in and offered to host downloads for QEMU on their CDN. This freed up all the Azure credits for more interesting uses. In particular, Stefan and I moved the Rackspace VMs over to Azure, after which the Rackspace bill went down to zero. This resulted in two VMs, both running CentOS Stream 9: - a larger one (E2s instance type) for Patchew and wiki.qemu.org, costing ~$1900/year between VMs and disks. The websites on this VM are implemented as podman containers + a simple nginx front-end on ports 80/443. - a smaller one (D2s instance type) one that proxies qemu.org and git.qemu.org to gitlab and provides an SSH mirror of the QEMU downloads, costing $1200/year between VMs and disks. This was a more traditional monolithic setup. We also have two virtual machines from OSUOSL (Oregon State University Open Source Labs); one is unused and can be decommissioned; the other (also running CentOS Stream 9) is running Patchew background jobs to import patches and apply them. Last April, Camilla Conte also added Kubernetes-based private runners for QEMU CI to our Azure setup. Private runners avoid hitting the GitLab limits on shared runners and shorten the time it takes to run individual test jobs. This is because CI, thanks to its burst-y nature, can use larger VMs than "pet" VMs such as the ones above. Currently we are using 8 vCPU / 32 GB VMs for the Kubernetes nodes, and each node is assigned 4 vCPUs. Starting June 1, all pipelines running in qemu-project/qemu have been using the private runners. Besides benefiting from the higher number of vCPUs per job, this, it leaves the GitLab shared runner allowance to Windows jobs as well as updates to qemu-web. It also made it possible to estimate the cost of running Linux jobs on Azure at all times, and to compare the costs with the credits that are made available through the sponsorship. Finally, earlier this month I noticed that the OSUOSL mirror for download.qemu.org was not being updated. Therefore, I rebuilt the qemu.org and git.qemu.org proxies as containers and moved them to the same VM running Patchew, wiki.qemu.org and now the KVM Forum website too. This made it possible to delete the second VM mentioned above. We will re-evaluate how to provide the source for mirroring download.qemu.org. Our consumption of Azure credits was as follows: * $2005 as of Jun 1, of which $371 used for the now-deleted D2s VM * $2673 as of Jun 28, of which $457 used for the now-deleted D2s VM Based on the credits consumed from Jun 1 to Jun 28, which should be representative of normal resource use, I am estimating the Azure costs as follows: $6700 for this year, of which: - $1650 for the E2s VM - $450 for the now-deleted D2s VM - $1600 for the Kubernetes compute nodes - $2500 for AKS (Azure Kubernetes Service) including system nodes, load balancing, monitoring and a few more itemized services(*) - $500 for bandwidth and IP address allocation $7800 starting next year, of which: - $1900 for the E2s VM - $2250 for the Kubernetes compute nodes - $3100 for AKS-related services - $550 for bandwidth and IP address allocation This fits within the allowance of the Azure open source credits program, while leaving some leeway in case of increased costs or increased usage of the private runners. As a contingency plan in case costs surge, we can always disable usage of the private runners and revert to wider usage of shared runners. That said, the cost for the compute nodes is not small. In particular, at the last QEMU Summit we discussed the possibility of adopting a merge request workflow for maintainer pull requests. These merge requests would replace the pipelines that are run by committers as part of merging trees, and therefore should not introduce excessive costs. However, as things stand, in case of a more generalized adoption of GitLab MRs(**) the QEMU project will *not* be able to shoulder the cost of running our (pretty expensive) CI on private runners for all merge requests. Thanks, Paolo (*) not that we use any of this, but they are added automatically when you set up AKS (**) which was NOT considered at QEMU Summit ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Azure infrastructure update 2023-06-28 10:44 Azure infrastructure update Paolo Bonzini @ 2023-06-28 11:28 ` Daniel P. Berrangé 2023-06-28 11:41 ` Paolo Bonzini 0 siblings, 1 reply; 4+ messages in thread From: Daniel P. Berrangé @ 2023-06-28 11:28 UTC (permalink / raw) To: Paolo Bonzini Cc: qemu, qemu-devel, Camilla Conte, Richard Henderson, Thomas Huth, Armbruster, Markus On Wed, Jun 28, 2023 at 12:44:33PM +0200, Paolo Bonzini wrote: > Starting June 1, all pipelines running in qemu-project/qemu have been > using the private runners. Besides benefiting from the higher number > of vCPUs per job, this, it leaves the GitLab shared runner allowance > to Windows jobs as well as updates to qemu-web. Also the python-qemu-qmp.git CI is on shared runners currently. > Our consumption of Azure credits was as follows: > * $2005 as of Jun 1, of which $371 used for the now-deleted D2s VM > * $2673 as of Jun 28, of which $457 used for the now-deleted D2s VM > > Based on the credits consumed from Jun 1 to Jun 28, which should be > representative of normal resource use, I am estimating the Azure costs > as follows: Only caveat is that June did not co-incide with a soft freeze. My impression is that our CI pipeline usage has a spike in the weeks around the freeze. > $6700 for this year, of which: > - $1650 for the E2s VM > - $450 for the now-deleted D2s VM > - $1600 for the Kubernetes compute nodes > - $2500 for AKS (Azure Kubernetes Service) including system nodes, > load balancing, monitoring and a few more itemized services(*) > - $500 for bandwidth and IP address allocation > > $7800 starting next year, of which: > - $1900 for the E2s VM Same size VM as last year, but more ? Is this is simply you anticipating possible price increases from Azure ? > - $2250 for the Kubernetes compute nodes IIUC, the $1600 from year this will cover about 7.5 months worth of usage (Jun -> Dec), which would imply more around $2500 for a full 12 months, possibly more if we add in peaks for soft freeze. IOW could conceivably be closer to $3k mark without much difficulty, especially if also start doing more pipelines for stable branches on a regular basis, now we have CI working properly for stable. > - $3100 for AKS-related services Same question about anticipated prices ? > - $550 for bandwidth and IP address allocation > > This fits within the allowance of the Azure open source credits > program, while leaving some leeway in case of increased costs or > increased usage of the private runners. As a contingency plan in case > costs surge, we can always disable usage of the private runners and > revert to wider usage of shared runners. We also still have Eldondev's physical machine setup as a runner, assuming that's going to be available indefinitely if we need the resource. > That said, the cost for the compute nodes is not small. In particular, > at the last QEMU Summit we discussed the possibility of adopting a > merge request workflow for maintainer pull requests. These merge > requests would replace the pipelines that are run by committers as > part of merging trees, and therefore should not introduce excessive > costs. Depending on how we setup the CI workflow, it might increase our usage, potentially double it quite easily. Right now, whomever is doing CI for merging pull requests is fully serializing CI pipelines, so what's tested 100% matches what is merged to master. With a merge request workflow it can be slightly different based on a couple of variables. When pushing a merge request to their fork, prior to opening the merge request, CI credits are burnt in their fork for every push, based on whatever is the HEAD of their branch. This might be behind current upstream 'master' by some amount. Typically when using merge requests though, you would change the gitlab CI workflow rules to trigger CI pipelines from merge request actions, instead of branch push actions. If we do this, then when opening a merge request, an initial pipeline would be triggered. If-and-only-if the maintainer has "Developer" on gitlab.com/qemu-project, then that merge request initial pipeline will burn upstream CI credits. If they are not a "Developer", it will burn their own fork credits. If they don't have any credits left, then someone with "Developer" role will have to spawn a pipeline on their behalf, which will run in upstream context and burn upstream credits. The latter is tedious, so I think expectation is that anyone who submits pull requests would be expected to have 'Developer' role on qemu-project. We want that anyway really so we can tag maintainers in issues on gitlab too. IOW, assume that any maintainer opening a merge req will be burning upstream CI credits on their merge request pipelines. This initial pipeline will run against a merge commit that grafts the head of the pull request, and 'master' at the time the pipeline was triggered. In a default config, if we apply the merge request at that point it would go into master with no further pipeline run. Merge requests are not serialized though. So if a second merge request had been applied to master, after the time the first merge request pipeline started, the pipeline for the first merge request is potentially invalid. Compared to our use of the (serialized) pipelines on the 'staging' branch, this setup would be a regression in coverage. To address this would require using GitLab's "merge trains" feature. When merge trains are enabled, when someone hits the button to apply a merge request to master, an *additional* CI pipeline is started based on the exact content that will be applied to master. Crucially, as the name hints, the merge train pipelines are serialized. IOW, if you request to apply 4 merge requests in quick succession a queue of pipelines will be created and run one after the other. If any pipeline fails, that MR is kicked out of the queue, and the following pipelines carry on. IOW, the merge trains feature replicates what we achieve with the serialized 'staging' branch. What you can see here though, is that every merge request will have at least 2 pipelines - one when the MR is opened, and one when it is applied to master - both consuming upstream CI credits. IOW, we potentially double our CI usage in this model if we don't make any changes to how CI pipelines are triggered. Essentially the idea with merge requests is that the initial pipeline upon opening the merge requests does full validation and catches all the silly stuff. Failures are ok because this is all parallelized with other MRs, so failures don't delay anything/anyone else. The merge train is then the safety net to prove the original pipeline results are still valid for current HEAD at time of applying it. You want the merge train pipelines to essentially never fail as that's disruptive to anything following on. If we can afford the CI credits, I'd keep things simple and just accept the increased CI burn, but with your figures above I fear we'd be too close to the limit to be relaxed about it. The extra eldondev runner could come into play here possibly. If we can't afford the double pipelines, then we would have to write our GitLab CI yml rules to exclude the initial pipeline, or just do a very minimalist "smoke test", and focus bulk of CI usage on teh merge train pipeline. This is all solvable in one way or another. We just need to figure out the right tradeoffs we want. > However, as things stand, in case of a more generalized > adoption of GitLab MRs(**) the QEMU project will *not* be able to > shoulder the cost of running our (pretty expensive) CI on private > runners for all merge requests. With more generalized adoption of MR workflow for all contributions bear in mind that many of the contributors will NOT have the 'Developer' role on gitlab.com/qemu-project. Thus their merge requests pipelines would run in fork context and consume their own CI credits, unless a "Developer" had to manually trigger a pipeline on their behalf. So yes, I agree that full adoption of MRs would definitley increase our CI usage, but not be quite such a horrendous amount as you might first think. We would definitely need more resources whichever way you look at it though. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Azure infrastructure update 2023-06-28 11:28 ` Daniel P. Berrangé @ 2023-06-28 11:41 ` Paolo Bonzini 2023-06-28 12:07 ` Daniel P. Berrangé 0 siblings, 1 reply; 4+ messages in thread From: Paolo Bonzini @ 2023-06-28 11:41 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu, qemu-devel, Camilla Conte, Richard Henderson, Thomas Huth, Armbruster, Markus On Wed, Jun 28, 2023 at 1:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > $6700 for this year, of which: > > - $1650 for the E2s VM > > - $450 for the now-deleted D2s VM > > - $1600 for the Kubernetes compute nodes > > - $2500 for AKS (Azure Kubernetes Service) including system nodes, > > load balancing, monitoring and a few more itemized services(*) > > - $500 for bandwidth and IP address allocation > > > > $7800 starting next year, of which: > > - $1900 for the E2s VM > > Same size VM as last year, but more ? Is this is simply you > anticipating possible price increases from Azure ? No it's just ~11 vs. 12 months, because we didn't set it up from the first day. > > - $2250 for the Kubernetes compute nodes > > IIUC, the $1600 from year this will cover about 7.5 months worth > of usage (Jun -> Dec), which would imply more around $2500 for a > full 12 months, possibly more if we add in peaks for soft freeze. > IOW could conceivably be closer to $3k mark without much difficulty, > especially if also start doing more pipelines for stable branches > on a regular basis, now we have CI working properly for stable. It also covers several weeks in May. In any case I've saved the broken down data and will redo the estimate after 4 months (Jun 1->Sep 30, covering both a soft freeze and a hard freeze). > > - $3100 for AKS-related services > > Same question about anticipated prices ? Same answer about 9-10 months vs. 12. :) > > That said, the cost for the compute nodes is not small. In particular, > > at the last QEMU Summit we discussed the possibility of adopting a > > merge request workflow for maintainer pull requests. These merge > > requests would replace the pipelines that are run by committers as > > part of merging trees, and therefore should not introduce excessive > > costs. > > Depending on how we setup the CI workflow, it might increase our > usage, potentially double it quite easily. > > Right now, whomever is doing CI for merging pull requests is fully > serializing CI pipelines, so what's tested 100% matches what is > merged to master. > > With a merge request workflow it can be slightly different based > on a couple of variables. > > When pushing a merge request to their fork, prior to opening the > merge request, CI credits are burnt in their fork for every push, > based on whatever is the HEAD of their branch. This might be behind > current upstream 'master' by some amount. > > Typically when using merge requests though, you would change the > gitlab CI workflow rules to trigger CI pipelines from merge request > actions, instead of branch push actions. Yes, that was my idea as well. > If we do this, then when opening a merge request, an initial pipeline > would be triggered. > > If-and-only-if the maintainer has "Developer" on gitlab.com/qemu-project, > then that merge request initial pipeline will burn upstream CI credits. > > If they are not a "Developer", it will burn their own fork credits. If > they don't have any credits left, then someone with "Developer" role > will have to spawn a pipeline on their behalf, which will run in > upstream context and burn upstream credits. The latter is tedious, > so I think expectation is that anyone who submits pull requests would > be expected to have 'Developer' role on qemu-project. We want that > anyway really so we can tag maintainers in issues on gitlab too. Agreed. Is there no option to have the "Developer" use his own credits? > IOW, assume that any maintainer opening a merge req will be burning > upstream CI credits on their merge request pipelines. [...] > Merge requests are not serialized though. [...] > To address this would require using GitLab's "merge trains" feature. > > When merge trains are enabled, when someone hits the button to apply > a merge request to master, an *additional* CI pipeline is started > based on the exact content that will be applied to master. Crucially, > as the name hints, the merge train pipelines are serialized. IOW, > if you request to apply 4 merge requests in quick succession a queue > of pipelines will be created and run one after the other. If any > pipeline fails, that MR is kicked out of the queue, and the > following pipelines carry on. > > IOW, the merge trains feature replicates what we achieve with the > serialized 'staging' branch. > > What you can see here though, is that every merge request will have > at least 2 pipelines - one when the MR is opened, and one when it > is applied to master - both consuming upstream CI credits. > > IOW, we potentially double our CI usage in this model if we don't > make any changes to how CI pipelines are triggered. [...] > If we can afford the CI credits, I'd keep things simple and > just accept the increased CI burn, but with your figures above > I fear we'd be too close to the limit to be relaxed about it. Hmm, now that I think about it I'm not sure the merge request CI would use private runners. Would it use the CI variables that are set in settings/ci_cd? If not, the pipeline would not tag the jobs for private runners, and therefore the merge request would use shared runners (thus burning project minutes, but that's a different problem). > If we can't afford the double pipelines, then we would have > to write our GitLab CI yml rules to exclude the initial > pipeline, or just do a very minimalist "smoke test", and > focus bulk of CI usage on teh merge train pipeline. > > This is all solvable in one way or another. We just need to > figure out the right tradeoffs we want. > > > However, as things stand, in case of a more generalized > > adoption of GitLab MRs(**) the QEMU project will *not* be able to > > shoulder the cost of running our (pretty expensive) CI on private > > runners for all merge requests. > > With more generalized adoption of MR workflow for all contributions > bear in mind that many of the contributors will NOT have the > 'Developer' role on gitlab.com/qemu-project. Thus their merge > requests pipelines would run in fork context and consume their own > CI credits, unless a "Developer" had to manually trigger a pipeline > on their behalf. I would expect most MRs to come from Developers but yes, that's not a given. Paolo ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Azure infrastructure update 2023-06-28 11:41 ` Paolo Bonzini @ 2023-06-28 12:07 ` Daniel P. Berrangé 0 siblings, 0 replies; 4+ messages in thread From: Daniel P. Berrangé @ 2023-06-28 12:07 UTC (permalink / raw) To: Paolo Bonzini Cc: qemu, qemu-devel, Camilla Conte, Richard Henderson, Thomas Huth, Armbruster, Markus On Wed, Jun 28, 2023 at 01:41:03PM +0200, Paolo Bonzini wrote: > > If we do this, then when opening a merge request, an initial pipeline > > would be triggered. > > > > If-and-only-if the maintainer has "Developer" on gitlab.com/qemu-project, > > then that merge request initial pipeline will burn upstream CI credits. > > > > If they are not a "Developer", it will burn their own fork credits. If > > they don't have any credits left, then someone with "Developer" role > > will have to spawn a pipeline on their behalf, which will run in > > upstream context and burn upstream credits. The latter is tedious, > > so I think expectation is that anyone who submits pull requests would > > be expected to have 'Developer' role on qemu-project. We want that > > anyway really so we can tag maintainers in issues on gitlab too. > > Agreed. Is there no option to have the "Developer" use his own credits? I've never found any way to control / force this behaviour. > > IOW, assume that any maintainer opening a merge req will be burning > > upstream CI credits on their merge request pipelines. [...] > > Merge requests are not serialized though. [...] > > To address this would require using GitLab's "merge trains" feature. > > > > When merge trains are enabled, when someone hits the button to apply > > a merge request to master, an *additional* CI pipeline is started > > based on the exact content that will be applied to master. Crucially, > > as the name hints, the merge train pipelines are serialized. IOW, > > if you request to apply 4 merge requests in quick succession a queue > > of pipelines will be created and run one after the other. If any > > pipeline fails, that MR is kicked out of the queue, and the > > following pipelines carry on. > > > > IOW, the merge trains feature replicates what we achieve with the > > serialized 'staging' branch. > > > > What you can see here though, is that every merge request will have > > at least 2 pipelines - one when the MR is opened, and one when it > > is applied to master - both consuming upstream CI credits. > > > > IOW, we potentially double our CI usage in this model if we don't > > make any changes to how CI pipelines are triggered. [...] > > If we can afford the CI credits, I'd keep things simple and > > just accept the increased CI burn, but with your figures above > > I fear we'd be too close to the limit to be relaxed about it. > > Hmm, now that I think about it I'm not sure the merge request CI would > use private runners. Would it use the CI variables that are set in > settings/ci_cd? If not, the pipeline would not tag the jobs for > private runners, and therefore the merge request would use shared > runners (thus burning project minutes, but that's a different > problem). The repo/project global CI env variable settings should be honoured for all pipelines in that repo, regardless of what action triggers the pipeline. So I'd expect merge request triggered pipelines to "just work" with the runner tagging. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-06-28 12:08 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-28 10:44 Azure infrastructure update Paolo Bonzini 2023-06-28 11:28 ` Daniel P. Berrangé 2023-06-28 11:41 ` Paolo Bonzini 2023-06-28 12:07 ` Daniel P. Berrangé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).