From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C20F4EB64D7 for ; Wed, 28 Jun 2023 11:29:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qETLL-00017x-Cb; Wed, 28 Jun 2023 07:28:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qETLI-00017R-F9 for qemu-devel@nongnu.org; Wed, 28 Jun 2023 07:28:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qETLF-0004zx-8b for qemu-devel@nongnu.org; Wed, 28 Jun 2023 07:28:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687951687; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=Jv041x81bonlglij7ZmWbSfM/UcrWCn6NmKRN84w51Q=; b=cw4GIkGfutMRfX3Tq3JnvH6kEF90his0xFiFQ4zXiHnoRa+vvmmbEz1Pco2eG74Fw5KLOy 5mHbHv0y7o+A7d5fyqwVuXFsQhTBJXPes1lTocaieY5lKwjwf/uczet8SLiBDW4u7qsnQy ET8KuMnMikE2YWxewyZIk7C6LdSc2ds= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-520-xPuZfvXXNiCmFdrQhUvYBQ-1; Wed, 28 Jun 2023 07:28:06 -0400 X-MC-Unique: xPuZfvXXNiCmFdrQhUvYBQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B4CF3185A794; Wed, 28 Jun 2023 11:28:05 +0000 (UTC) Received: from redhat.com (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 90E3E200C0F2; Wed, 28 Jun 2023 11:28:04 +0000 (UTC) Date: Wed, 28 Jun 2023 12:28:02 +0100 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Paolo Bonzini Cc: qemu@sfconservancy.org, qemu-devel , Camilla Conte , Richard Henderson , Thomas Huth , "Armbruster, Markus" Subject: Re: Azure infrastructure update Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.9 (2022-11-12) X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.133.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Wed, Jun 28, 2023 at 12:44:33PM +0200, Paolo Bonzini wrote: > Starting June 1, all pipelines running in qemu-project/qemu have been > using the private runners. Besides benefiting from the higher number > of vCPUs per job, this, it leaves the GitLab shared runner allowance > to Windows jobs as well as updates to qemu-web. Also the python-qemu-qmp.git CI is on shared runners currently. > Our consumption of Azure credits was as follows: > * $2005 as of Jun 1, of which $371 used for the now-deleted D2s VM > * $2673 as of Jun 28, of which $457 used for the now-deleted D2s VM > > Based on the credits consumed from Jun 1 to Jun 28, which should be > representative of normal resource use, I am estimating the Azure costs > as follows: Only caveat is that June did not co-incide with a soft freeze. My impression is that our CI pipeline usage has a spike in the weeks around the freeze. > $6700 for this year, of which: > - $1650 for the E2s VM > - $450 for the now-deleted D2s VM > - $1600 for the Kubernetes compute nodes > - $2500 for AKS (Azure Kubernetes Service) including system nodes, > load balancing, monitoring and a few more itemized services(*) > - $500 for bandwidth and IP address allocation > > $7800 starting next year, of which: > - $1900 for the E2s VM Same size VM as last year, but more ? Is this is simply you anticipating possible price increases from Azure ? > - $2250 for the Kubernetes compute nodes IIUC, the $1600 from year this will cover about 7.5 months worth of usage (Jun -> Dec), which would imply more around $2500 for a full 12 months, possibly more if we add in peaks for soft freeze. IOW could conceivably be closer to $3k mark without much difficulty, especially if also start doing more pipelines for stable branches on a regular basis, now we have CI working properly for stable. > - $3100 for AKS-related services Same question about anticipated prices ? > - $550 for bandwidth and IP address allocation > > This fits within the allowance of the Azure open source credits > program, while leaving some leeway in case of increased costs or > increased usage of the private runners. As a contingency plan in case > costs surge, we can always disable usage of the private runners and > revert to wider usage of shared runners. We also still have Eldondev's physical machine setup as a runner, assuming that's going to be available indefinitely if we need the resource. > That said, the cost for the compute nodes is not small. In particular, > at the last QEMU Summit we discussed the possibility of adopting a > merge request workflow for maintainer pull requests. These merge > requests would replace the pipelines that are run by committers as > part of merging trees, and therefore should not introduce excessive > costs. Depending on how we setup the CI workflow, it might increase our usage, potentially double it quite easily. Right now, whomever is doing CI for merging pull requests is fully serializing CI pipelines, so what's tested 100% matches what is merged to master. With a merge request workflow it can be slightly different based on a couple of variables. When pushing a merge request to their fork, prior to opening the merge request, CI credits are burnt in their fork for every push, based on whatever is the HEAD of their branch. This might be behind current upstream 'master' by some amount. Typically when using merge requests though, you would change the gitlab CI workflow rules to trigger CI pipelines from merge request actions, instead of branch push actions. If we do this, then when opening a merge request, an initial pipeline would be triggered. If-and-only-if the maintainer has "Developer" on gitlab.com/qemu-project, then that merge request initial pipeline will burn upstream CI credits. If they are not a "Developer", it will burn their own fork credits. If they don't have any credits left, then someone with "Developer" role will have to spawn a pipeline on their behalf, which will run in upstream context and burn upstream credits. The latter is tedious, so I think expectation is that anyone who submits pull requests would be expected to have 'Developer' role on qemu-project. We want that anyway really so we can tag maintainers in issues on gitlab too. IOW, assume that any maintainer opening a merge req will be burning upstream CI credits on their merge request pipelines. This initial pipeline will run against a merge commit that grafts the head of the pull request, and 'master' at the time the pipeline was triggered. In a default config, if we apply the merge request at that point it would go into master with no further pipeline run. Merge requests are not serialized though. So if a second merge request had been applied to master, after the time the first merge request pipeline started, the pipeline for the first merge request is potentially invalid. Compared to our use of the (serialized) pipelines on the 'staging' branch, this setup would be a regression in coverage. To address this would require using GitLab's "merge trains" feature. When merge trains are enabled, when someone hits the button to apply a merge request to master, an *additional* CI pipeline is started based on the exact content that will be applied to master. Crucially, as the name hints, the merge train pipelines are serialized. IOW, if you request to apply 4 merge requests in quick succession a queue of pipelines will be created and run one after the other. If any pipeline fails, that MR is kicked out of the queue, and the following pipelines carry on. IOW, the merge trains feature replicates what we achieve with the serialized 'staging' branch. What you can see here though, is that every merge request will have at least 2 pipelines - one when the MR is opened, and one when it is applied to master - both consuming upstream CI credits. IOW, we potentially double our CI usage in this model if we don't make any changes to how CI pipelines are triggered. Essentially the idea with merge requests is that the initial pipeline upon opening the merge requests does full validation and catches all the silly stuff. Failures are ok because this is all parallelized with other MRs, so failures don't delay anything/anyone else. The merge train is then the safety net to prove the original pipeline results are still valid for current HEAD at time of applying it. You want the merge train pipelines to essentially never fail as that's disruptive to anything following on. If we can afford the CI credits, I'd keep things simple and just accept the increased CI burn, but with your figures above I fear we'd be too close to the limit to be relaxed about it. The extra eldondev runner could come into play here possibly. If we can't afford the double pipelines, then we would have to write our GitLab CI yml rules to exclude the initial pipeline, or just do a very minimalist "smoke test", and focus bulk of CI usage on teh merge train pipeline. This is all solvable in one way or another. We just need to figure out the right tradeoffs we want. > However, as things stand, in case of a more generalized > adoption of GitLab MRs(**) the QEMU project will *not* be able to > shoulder the cost of running our (pretty expensive) CI on private > runners for all merge requests. With more generalized adoption of MR workflow for all contributions bear in mind that many of the contributors will NOT have the 'Developer' role on gitlab.com/qemu-project. Thus their merge requests pipelines would run in fork context and consume their own CI credits, unless a "Developer" had to manually trigger a pipeline on their behalf. So yes, I agree that full adoption of MRs would definitley increase our CI usage, but not be quite such a horrendous amount as you might first think. We would definitely need more resources whichever way you look at it though. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|