From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 261B3C433E2 for ; Fri, 4 Sep 2020 15:11:03 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E76DF2073B for ; Fri, 4 Sep 2020 15:11:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E76DF2073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:59000 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kEDMk-0006PM-4y for qemu-devel@archiver.kernel.org; Fri, 04 Sep 2020 11:11:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kEDM4-0005ph-1v for qemu-devel@nongnu.org; Fri, 04 Sep 2020 11:10:20 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:59258 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1kEDM0-0001Oh-4e for qemu-devel@nongnu.org; Fri, 04 Sep 2020 11:10:19 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-246-GcqnC4MhOmaCnirajaerkg-1; Fri, 04 Sep 2020 11:10:12 -0400 X-MC-Unique: GcqnC4MhOmaCnirajaerkg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 78EB781C478; Fri, 4 Sep 2020 15:10:11 +0000 (UTC) Received: from localhost.localdomain (ovpn-120-166.rdu2.redhat.com [10.10.120.166]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5E94281195; Fri, 4 Sep 2020 15:10:02 +0000 (UTC) Date: Fri, 4 Sep 2020 11:10:00 -0400 From: Cleber Rosa To: Daniel =?iso-8859-1?Q?P=2E_Berrang=E9?= Subject: Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts Message-ID: <20200904151000.GC232153@localhost.localdomain> References: <20200709024657.2500558-1-crosa@redhat.com> <20200709024657.2500558-3-crosa@redhat.com> <20200709103029.GK3753300@redhat.com> <20200904001139.GE55646@localhost.localdomain> <20200904081816.GB721059@redhat.com> MIME-Version: 1.0 In-Reply-To: <20200904081816.GB721059@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0.0 X-Mimecast-Originator: redhat.com Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="yLVHuoLXiP9kZBkt" Content-Disposition: inline Received-SPF: pass client-ip=205.139.110.120; envelope-from=crosa@redhat.com; helo=us-smtp-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/04 01:57:12 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Thomas Huth , Beraldo Leal , Erik Skultety , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= , qemu-devel@nongnu.org, Wainer dos Santos Moschetta , Willian Rampazzo , Alex =?iso-8859-1?Q?Benn=E9e?= , Eduardo Habkost Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --yLVHuoLXiP9kZBkt Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 04, 2020 at 09:18:16AM +0100, Daniel P. Berrang=E9 wrote: > On Thu, Sep 03, 2020 at 08:11:39PM -0400, Cleber Rosa wrote: > > On Thu, Jul 09, 2020 at 11:30:29AM +0100, Daniel P. Berrang=E9 wrote: > > > On Wed, Jul 08, 2020 at 10:46:57PM -0400, Cleber Rosa wrote: > > > > This is a mapping of Peter's "remake-merge-builds" and > > > > "pull-buildtest" scripts, gone through some updates, adding some bu= ild > > > > option and removing others. > > > >=20 > > > > The jobs currently cover the machines that the QEMU project owns, a= nd that > > > > are setup and ready to run jobs: > > > >=20 > > > > - Ubuntu 18.04 on S390x > > > > - Ubuntu 20.04 on aarch64 > > > >=20 > > > > During the development of this set of jobs, the GitLab CI was teste= d > > > > with many other architectures, including ppc64, s390x and aarch64, > > > > along with the other OSs (not included here): > > > >=20 > > > > - Fedora 30 > > > > - FreeBSD 12.1 > > > >=20 > > > > More information can be found in the documentation itself. > > > >=20 > > > > Signed-off-by: Cleber Rosa > > > > --- > > > > .gitlab-ci.d/gating.yml | 146 +++++++++++++++++ > > >=20 > > > AFAIK, the jobs in this file just augment what is already defined > > > in the main .gitlab-ci.yml. Also since we're providing setup info > > > for other people to configure custom runners, these jobs are usable > > > for non-gating CI scenarios too. > > > > >=20 > > If you mean that they introduced new jobs, you're right. > >=20 > > > IOW, the jobs in this file happen to be usable for gating, but they > > > are not the only gating jobs, and can be used for non-gating reasons. > > > > >=20 > > Right, I do not doubt these jobs may be useful to other people and on > > scenarios other than "before merging a patch series". > >=20 > > > This is a complicated way of saying that gating.yml is not a desirabl= e > > > filename, so I'd suggest splitting it in two and having these files > > > named based on what their contents is, rather than their use case: > > >=20 > > > .gitlab-ci.d/runners-s390x.yml > > > .gitlab-ci.d/runners-aarch64.yml > > >=20 > > > The existing jobs in .gitlab-ci.yml could possibly be moved into > > > a .gitlab-ci.d/runners-shared.yml file for consistency. > > > > >=20 > > Do you imply that every gitlab CI job should be a gating job? And > > that the same jobs should be used when other people with their own > > forks? I find this problematic because: > >=20 > > * It would trigger pipelines with jobs that, unless every user has the > > same runners configured, would have unfulfilled jobs that don't have > > a matching hardware. >=20 > Jobs that require a custom runner should not be set to run by default, > but individual contributors must absolutely be able to opt-in to running > those jobs simply by registering a runner on their account. > Agreed, and that's why they have been put into this diffent "gating" class here. > > * It dilutes the idea that those jobs are inherently different with > > regards to the management of their infrastructure. >=20 > I don't really know what yiu mean here, but "Inherantly different" > does not sound like a desirable property. > Organizations and individuals will have responsibility over the infrastructure they choose to add, which is "inherently different" from the gitlab shared machines. Not sure there's a way around it. > > * It destroys the notion of layered testing, for whatever people find > > that worth it, where a faster turnaround could/would be possible > > with fewer jobs for every push, and many more jobs before a merge. >=20 > The key goal of CI is to reduce the burden on maintainers. The biggest > cost is if we merge code and failure is noticed after merge. IT is > still a large cost, however, if Peter only finds a CI failure when he > attempts the pre-merge test. He has to throw out the pull request > putting more work on the subsystem maintainer. The subsystem maintainer > may have to throw it back to the original author. >=20 > The ideal scenario that we need to strive towards is that the original > author has tested their code with 100% coverage of all the CI jobs QMEU > has defined. > I agree... but it's also unrealistic at this point, right? For instance, do we have s390x boxes to run all of those? Avocado has been using Travis CI for s390x/ppc64/aarch64, and those are quite unreliable even with a load many orders of magnitude smaller then the QEMU project. So, resources are needed to have this flat, 100% coverage, "ideal scenario" you describe. > Any time there is a job that is not run by authors, but only by the > maintainers, we are putting increased burden on the maintainers, so > must be minimize that. > I agree. But if resources are limited, then should the testing scope be decresead so that it's equalized? > IOW, layered testing is not desirable as goal. Rather layered testing > is just a default setup, but we'd encourage contributors to run the > full set of CI jobs, especially if they are frequent contributors. > The more they run themselves, the less burden on subsystem maintainers > and Peter, and thus the better we all scale. > We agree on goals, we don't agree on the strategy though. > > Finally, I find the split by runner architecture you suggested > > problematic because different organizations may have jobs for the same > > architecture. I believe that files for different organizations may be > > a better organization instead. Entries in the MAINTAINERS are one > > example where the grouping by architecture may not be optimal. >=20 > I don't think we should be structuring jobs around organizations. We > should be defining a set of desired jobs we wish to be able to run. > Any organization can bring a runner that is capable of running the > jobs and donate it to the QEMU project for our formal CI runner > The organization is not defining the job though - QEMU is defining > the jobs we expect to have used for testing. > This was disscussed previously[1]. > This is key because any contributor needs to be able to spin up an > identical envrionment to replicate any build failures. We don't want > runners for merge testing that are built as a blackbox by someone. > That is the single biggest painpoint with Peter's current merge > jobs - we can't easily replicate Peter's merge env even if we had > the matching hardware available. > With the right automation, such as the playbooks introduced here, any person with the same hardware should have an environment to replicate a job and debug and issue. [1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html Best regards, - Cleber. > Regards, > Daniel > --=20 > |: https://berrange.com -o- https://www.flickr.com/photos/dberran= ge :| > |: https://libvirt.org -o- https://fstop138.berrange.c= om :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberran= ge :| --yLVHuoLXiP9kZBkt Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEeruW64tGuU1eD+m7ZX6NM6XyCfMFAl9SWMUACgkQZX6NM6Xy CfNIEw//d52NKTDmoUH4nyHpCebPdmS/te8NfokTht1HPfU5oHjeU5W03EAeww21 i0AYvI7Br5efkxGUyIPhZT52vdDXKgw1hvKfOrYZ1VFtuuKtgaCrvanLqHb5Tmxt PMiX/abgrOVwuJO86GyoOoyuVvpJKj4n28IbLJXixxY+rJYlxXBX0kfUo1b59oOD uHKaEwy6EsHDlHTI5EMsSElRdn4IIi+eKJi97JLsG0fwqOE5C5ekDM7FCQ5qtOIg GFONoVM5t1hUe15OSqwYmusdCbtabjLiIZ8mioT5f4UCLihka9kQz6dFY+RJ3JeP tOinqBn0YUtaXakR9ODHQlJ94D37GaCXBe/fr4U09+QfnHuzSLniXLfAz8hDLkBV 1CWeEYCBd5050SaGgcXTiP5Q04WwyRHv2MlRxmV7IyzlMkvWBD64/o9gWGWUPgAu OcvXfb0snzhMg6YX/8nFx9X9BFq2tq2MmhC9nkUx9ujuVg5gkqxdaSRo/dk8oMMx 6524GDNAObDxMjyT4OuvvtnfLB3wJB3UTS2MvMDjMNaTszLq8T3wtdkDltUjITs5 NWUw7hgAOWdERNe6rBDokFe4Zg+dPGFsYHJDX6G/xkF+lLxQTs0OemRfqtfLqP9y 9XB8y3u3V4YRmtynsmahJJwYwLuh1o0qxG1ltE8B1vx07EN2GQE= =yAHT -----END PGP SIGNATURE----- --yLVHuoLXiP9kZBkt--