From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE4AFC433EF for ; Tue, 19 Oct 2021 23:01:51 +0000 (UTC) Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08DDF6112D for ; Tue, 19 Oct 2021 23:01:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 08DDF6112D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.denx.de Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id 38FDC8319B; Wed, 20 Oct 2021 01:01:48 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Authentication-Results: phobos.denx.de; dkim=pass (1024-bit key; unprotected) header.d=konsulko.com header.i=@konsulko.com header.b="db9dXEze"; dkim-atps=neutral Received: by phobos.denx.de (Postfix, from userid 109) id BBA4883246; Wed, 20 Oct 2021 01:01:44 +0200 (CEST) Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id E558982DB4 for ; Wed, 20 Oct 2021 01:01:38 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=trini@konsulko.com Received: by mail-qk1-x735.google.com with SMTP id b15so1504084qkl.10 for ; Tue, 19 Oct 2021 16:01:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=lpKshsEfHSPpT1rOD9v8Roim+iG9ExldhwqhO2VMO8c=; b=db9dXEzeqI1cEYNnI7z93LtPV9FlGXYLkCuY9Zj7HB+X0kNqqCzPEe42RLKwSA5y15 v//fYoE7oWcDYiPt0cyJviC9LTC0/nRWPdTexZHuF9cRC/LOJAI1dzeTGi36VH4rx+ZZ VF+AyjzM9wwA+3xfE/dYcymcnHz0AbFHNfj4c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=lpKshsEfHSPpT1rOD9v8Roim+iG9ExldhwqhO2VMO8c=; b=KYw6JrlHnr5xTxfj8f1u6QN+mh9TR9N2TuAHp7p67a/jVo4WwpXMiMVzAz48mSITuJ 8B+3xI6aDX8SnoJZC4qZ70WuM2sKRwoZj7RaI/fwNI6D9jaiML+p2+VmorUBcEzIxy+k XA6KbBAeRhNmi0tJ9++wSa4MENHG09YEabqfMqpktpaJb1qIFQXb2W9vH+Lbb4hPvoi3 X/DjJnVzGokGvm0yBldcmEfR3SHVjTE4Oiq/fjQnnI9U7xDLyvH79T/idS4OEtOX6H9o cBTDS92D+DYW7GQhYiYW0NwYPoQI09oDChGEQrE0ECFGiHWygrHTFqs8oJKXzZ+zBZ2U aYCA== X-Gm-Message-State: AOAM5332kX1Z26MYQT8S+tJpj7YbD9mkTZL39Y7PCWzPQ8+bFsxsCnDX HcgTP4JFtKx20x6L3JKyibxRlg== X-Google-Smtp-Source: ABdhPJydoaDZSnV/l4KwcqScgOSkMLaHiIVfP82+keRkCrq1kWYc+dqQ3iFV7nqX7bRJDMAWPYoxBw== X-Received: by 2002:a37:b501:: with SMTP id e1mr2464965qkf.307.1634684497717; Tue, 19 Oct 2021 16:01:37 -0700 (PDT) Received: from bill-the-cat (2603-6081-7b01-cbda-815c-b889-de4c-aa50.res6.spectrum.com. [2603:6081:7b01:cbda:815c:b889:de4c:aa50]) by smtp.gmail.com with ESMTPSA id g1sm209331qkd.89.2021.10.19.16.01.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Oct 2021 16:01:37 -0700 (PDT) Date: Tue, 19 Oct 2021 19:01:35 -0400 From: Tom Rini To: Simon Glass Cc: Stefano Babic , "U-Boot@lists.denx.de" Subject: Re: buildman stops (crashed) on current master Message-ID: <20211019230135.GL7964@bill-the-cat> References: <720a1fe2-894d-0e6c-ece5-b3c737857dd7@denx.de> <20211019225325.GK7964@bill-the-cat> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="hQksn6klg06e4Mnq" Content-Disposition: inline In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.34 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.2 at phobos.denx.de X-Virus-Status: Clean --hQksn6klg06e4Mnq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 19, 2021 at 04:59:15PM -0600, Simon Glass wrote: > Hi Tom, >=20 > On Tue, 19 Oct 2021 at 16:53, Tom Rini wrote: > > > > On Tue, Oct 19, 2021 at 05:39:12PM +0200, Stefano Babic wrote: > > > Hi Simon, > > > > > > On 07.10.21 15:43, Simon Glass wrote: > > > > Hi Stefano, > > > > > > > > On Thu, 7 Oct 2021 at 04:37, Stefano Babic wrote: > > > > > > > > > > Hi all, > > > > > > > > > > CI stops by building aarch64 without notice, for reference: > > > > > > > > > > https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319 > > > > > > > > > > There is no error, just process is killed. It looks like it stops= at > > > > > xilinx_zynqmp_virt, > > > > > > > > > > ./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can b= e built > > > > > without issues. > > > > > > > > > > If I build on my host (not in docker, anyway), it generally build= s fine > > > > > - but it crashes sometimes, too. On gitlab instance , it crashes. > > > > > Issue does not seem that depends on merged patches, and introduces > > > > > boards were already built successfully. Any hint ? I have also no= idea > > > > > what I should look as what I see is just > > > > > > > > > > "usr/bin/bash: line 104: 24 Killed > > > > > ./tools/buildman/buildman -o /tmp -P -E -W aarch64" > > > > > > > > I cannot see that link. I am not sure what is going on. Does it say > > > > what signal killed it? > > > > > > Pipelines on our server were not public - I have enbaled now for u-bo= ot-imx. > > > > > > > > > > > Does it sit there for an hour and timeout? If so, then I did see t= hat > > > > myself once recently, when the Kconfig needed stdin, but I could not > > > > quitetie it down. I think buildman would provide it, but sometimes > > > > not, apparently. So it can happen when there is an existing build > > > > there and your new one which adds Kconfig options that don't have > > > > defaults, or something like that? > > > > > > > > > > I have investigated further, and I can reproduce it on my host outsid= e the > > > gitlab server. buildman causes a OOM, but I cannot find the cause. > > > > > > Strange enough, this happens with the "aarch64" target, and I cannot > > > reproduce it with Tom's master. So it seems that -master is ok, and s= omethin > > > on u-boot-imx generates the OOM. > > > > > > However.... > > > > > > The OOM happens always when -2 (two boards remain) appears. I can see= with > > > htop that buildman starts to allocate memory until it is exhausted (6= 4GB RAM > > > + 8 GB swap). Then the kernel decides that it is enough and kills bui= ldman - > > > this is what I see on Ci. > > > > > > You can see now the pipelines: > > > > > > https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520 > > > > > > I have then split aarch64 and I built imx8 separately - same result. = The > > > pipeline stops with xilinx board, but they have nothing to do. In fac= t, I > > > can build all xilinx board separately. If I run buildman -W aarch64 -x > > > xilinx, OOM is shown by another board. > > > > > > Strange enough, I can build each single board with buildman without i= ssues, > > > neither errors nor warnongs. Just when buildman runs all together (aa= rch64, > > > 308 boards), the OOM is generated. > > > > > > Bisect does not help: I started bisect, and at the end this commit was > > > presented: > > > > > > commit 53a24dee86fb72ae41e7579607bafe13442616f2 > > > Author: Fabio Estevam > > > Date: Mon Aug 23 21:11:09 2021 -0300 > > > > > > imx8mm-cl-iot-gate: Split the defconfigs > > > > I strongly suspect what's going on here is that these new defconfigs are > > out of sync with changes now in Kconfig. The build itself will just sit > > there, waiting for the "oldconfig" prompt to be answered. > > > > I want to say the problem here is that stdin is open, rather than > > pointing to something closed and would lead to the build failing > > immediately, rather than once a timeout is hit, or OOM kicks in due to > > kconfig chewing up all the memory. >=20 > Yes that's exactly what I saw... >=20 > In fact, see this commit: >=20 > e62a24ce27a buildman: Avoid hanging when the config changes >=20 > But that was 3 years ago. Looks like something else needs to be changed then, I've bisected down similar failures here before very recently. --=20 Tom --hQksn6klg06e4Mnq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEGjx/cOCPqxcHgJu/FHw5/5Y0tywFAmFvTk8ACgkQFHw5/5Y0 tyxlngv/Z7VtfdJRRgOWWCxVEvWtEdRSrzDeJ+ilaF7h9ZOctimctkcrsmCTWVWl m4FbDtQtM43l3XAsebVpfTdFn0wzr72TkY1QAg7I9qqEctt/5k6d4hfqFEKoy8aV x3/A/xwNV/p5pmSzt46bmGFd5rhcJtmrep5gKsjDgJOjtGoli4C4YORY3qqco1et dC8vprraqav6FJrGaRZEPonSXhSPb1kkEUdWM0PPWZR2/JbjTdAk4ccg+41d4knl LwHsRNdWLH3xquU5jvvBnEyyYDt8Q1assW+Oz7IcgroJpWgfoS0NRKlBNFVdITcd aVj8rEidgV1Cb/9Dx5HEJvkNBe7alMACnBz+K4dX8r0ZqiVUyqVIu81F9O4mOjjh xfQuCJOMEu2nRsP9HKiaQKs6KHyDNE/Kp3xtD1nIc1AZgaBD3Z7aBMHdwchQbMyV oGzdaexweSAqAfbDLqaQV5FKhZqOcXy+qsA8++VCIgmL6TQIelfnSj6g+ocEfPdU 3Vw3G2j8 =384c -----END PGP SIGNATURE----- --hQksn6klg06e4Mnq--