From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL] Date: Tue, 26 Sep 2017 22:51:50 +0200 Message-ID: <1506459110.27663.41.camel@citrix.com> References: <1506348460.27663.3.camel@citrix.com> <1506411226.27663.28.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1106507755700510244==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Julien Grall , osstest service owner , xen-devel@lists.xensource.com Cc: Meng Xu , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org --===============1106507755700510244== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-0UYyeS1lrql/wpwOtFGD" --=-0UYyeS1lrql/wpwOtFGD Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2017-09-26 at 18:28 +0100, Julien Grall wrote: > On 09/26/2017 08:33 AM, Dario Faggioli wrote: > > >=20 > > Here's the logs: > > http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf- > > armhf-xl-rtds/info.html >=20 > It does not seem to be similar, in the credit2 case the kernel is > stuck at very early boot. > Here it seems it is running (there are grants setup). >=20 Yes, I agree, it's not totally similar. > This seem to be confirmed from the guest console log, I can see the > prompt. Interestingly > when the guest job fails, it has been waiting for a long time disk > and hvc0. Although, it > does not timeout. >=20 Ah, I see what you mean, I found it in the guest console log. > I am actually quite surprised that we start a 4 vCPUs guest on a 2 > pCPUs platform. The total of > vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest > for testing. So I was > wondering if we end up to have too many vCPUs running on the platform > and making it unreliable > the test? >=20 Well, doing that, with this scheduler, is certainly *not* the best recipe for determinism and reliability. In fact, RTDS is a non-work conserving scheduler. This means that (with default parameters) each vCPU gets at most 40% CPU time, even if there are idle cycles. With 6 vCPU, there's a total demand of 240% of CPU time, and with 2 pCPUs, there's at most 200% of that, which means we're in overload (well, at least that's the case if/when all the vCPUs try to execute for their guaranteed 40%). Things *should really not* explode (like as in Xen crashes) if that happens; actually, from a scheduler perspective, it should really not be too big of a deal (especially if the overload is transient, like I guess it should be in this case). However, it's entirely possible that some specific vCPUs failing to be scheduler for a certain amount of time, causes something _inside_ the guest to timeout, or get stuck or wedged, which may be what happens here. I'm adding Meng to Cc, to see what he thinks about this situation. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-0UYyeS1lrql/wpwOtFGD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZyr3mAAoJEBZCeImluHPuq0UQAMalK3HTUjQcQX65KseZOoLt niXkqrW5pz1n4AVw3MQzM9ZtaKSkr7jSie/SCx/lDEHvFBC8Lg2Wdgae7Z7RNlPJ j0yx4OgapeXqwWouC7CCd4g1iXKJtt097+aNZHSA4GT54xDc/BpQ6pzMtjAORVHb d3ZDps3L/E3JQCvZJ823dqiScOC3XyMtqU7hUIxmR7F/TPrG/Kjw8JwaFHF4ZMNB imKRgHFL3ffXjUJWofEP6iwIcenO84W2lnuB84imWWO8aSl1aq5MJkhNrdFYFRV5 nOz8d5ebNHf9FMHW3m3pc/hxZcCUvUmV62Q7wbHgBnL9wx/4ty0JN5B5v1skH9ey IlNDoJiB5271tfVPYzKL17G2wDAnbnWOi9Sr6qs9egYN7WfhDn5fZYm1mJ1ZX3nB sou0aMfRYoS36j/2BP89aVb0RYnpPs8qxSJU1Lzj2/r1zTO1xmdjEkQb3fT+O0fl t+awMAE/V3h8lkGzQGOD63r3LlCaQj+mwR0ldG2ZYLaFftSIaScU1tfdOp0IyYjp HgvIrhxR5GcC3P1Q1WQTpVD4KoydBtg2GrE3Z64M/1Lq3DFn4/uSPSbGJeI8L0sN lnEvc9qyhoxZEXzaiLpWWqEabPwFcT14CuNHFJ3Y6JA+Oeh85hp9A/E7epsMTas1 a0zACGLcljbaJgxL9v+5 =ZhAO -----END PGP SIGNATURE----- --=-0UYyeS1lrql/wpwOtFGD-- --===============1106507755700510244== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============1106507755700510244==--