From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1aMbSM-0003Y6-Ky for mharc-grub-devel@gnu.org; Fri, 22 Jan 2016 08:08:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37772) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMbSJ-0003Xz-8x for grub-devel@gnu.org; Fri, 22 Jan 2016 08:08:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMbSH-0006AL-LV for grub-devel@gnu.org; Fri, 22 Jan 2016 08:08:47 -0500 Received: from mail-wm0-x234.google.com ([2a00:1450:400c:c09::234]:34226) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMbSH-0006AF-Ag for grub-devel@gnu.org; Fri, 22 Jan 2016 08:08:45 -0500 Received: by mail-wm0-x234.google.com with SMTP id u188so18182673wmu.1 for ; Fri, 22 Jan 2016 05:08:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=n4g66aJZGY53wq26KmItcPfzAsT3v0lDvfxa1vTQFAk=; b=h1eSR3PiTTKa8WiE+ci9dXdslhWyf5Mtdl2fJVtu5UQfgLVNeEsQQ3ljK8WFNV/5/L vHRhqj6UHNbkGYXTuItBQcy8RRu0kuoMXzXGEj94pJnLyLG99ii9ugIFhVCNs70KDFn5 YXz/vpBVhzuCa/cIb358x5T7lN+Kvz6M6s+fPrP5IXPMNpxer7Nq65V1+KOqf1BQi2gt J+lGrF9+9zQZO1Qf3sGBB0md0pmzN9C08qvvvOO+wkmbUuBIcKABN2TyKKC7+2Gc7K3b qhUUgETQNmWCugTlvPFs5uSdiFfwPXgSKDt3CHzA9XlDDFpllgPeYDVZA3sNFqvnAye2 w0/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type; bh=n4g66aJZGY53wq26KmItcPfzAsT3v0lDvfxa1vTQFAk=; b=CK5CD41BGLgxdiU0JdjHgsq4fPIbzQINP+nFRmDFtttVVsJBJx0HsELgHr+CgRPj+9 krkW4NNe8U5pyCfsKENrYLI/7jsSSOyK+Y4om9pltDFV7HjERTu9Yt4TbWuqEF6XYO4b zbQ5Wm2gR6L0ip9fSkK/aMnIH+bTyRiuCLRa7jNSjBNFoeUs73D2DbYdT/B7zjTkvlg4 DZbjKa6I1Yiukd1qcZUmcflqRF4oNdIViPgtZKABxDUITFrO7BQ8Q7+Mr8hLRB0MKKIV iNiCTCEJdzVr0v5paDnBaXU9nY9dpumEiDp6FT11BR/FUz/z2rFqT3taNPXj66g6bWxh z7tQ== X-Gm-Message-State: AG10YOSc8/c5odX/sNRNweW9gx7PQWd3Ipa8hh/qMSi8i88i7ZdodBGXbvo6oQpH1eNa3A== X-Received: by 10.194.176.74 with SMTP id cg10mr3626846wjc.169.1453468124672; Fri, 22 Jan 2016 05:08:44 -0800 (PST) Received: from ?IPv6:2a02:120b:2c41:63f0:a2a8:cdff:fe64:b3b5? ([2a02:120b:2c41:63f0:a2a8:cdff:fe64:b3b5]) by smtp.gmail.com with ESMTPSA id w8sm5886397wjx.21.2016.01.22.05.08.43 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 22 Jan 2016 05:08:43 -0800 (PST) Subject: Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub To: Andrew Cooper , Ian Campbell , grub-devel@gnu.org References: <5600628A.20202@zappa.cx> <1442912018.10338.118.camel@citrix.com> <56A226F8.3020301@gmail.com> <56A22847.3020708@citrix.com> From: =?UTF-8?Q?Vladimir_'=cf=86-coder/phcoder'_Serbinenko?= Message-ID: <56A229DA.7030904@gmail.com> Date: Fri, 22 Jan 2016 14:08:42 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 In-Reply-To: <56A22847.3020708@citrix.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="NiAPK1EO5piikBI2rnGr819wuLJIulk1a" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::234 Cc: Andreas Sundstrom , xen-devel@lists.xen.org X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2016 13:08:49 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --NiAPK1EO5piikBI2rnGr819wuLJIulk1a Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 22.01.2016 14:01, Andrew Cooper wrote: > On 22/01/16 12:56, Vladimir '=CF=86-coder/phcoder' Serbinenko wrote: >> On 22.09.2015 10:53, Ian Campbell wrote: >>> Hi Vladimir & grub-devel, >>> >>> Do you have any thoughts on this issue with i386 pv-grub2? >>> >> Is it still an issue? If so I'll try to replicate it. From stack dump = I >> see that it has jumped to NULL. GRUB has no threads so it's not a race= >> condition with itself but may be one with some Xen part. An altrnative= >> possibility is that grub forgets to flush cache at some point in boot >> process. >=20 > Looks like GRUB doesn't have a traptable registered with Xen (the PV > equivalent of the IDT). >=20 > First, Xen tried to inject a #GP fault and found that the entry EIP was= > at 0 (which is sadly the default if nothing is specified). It then too= k > a pagefault while attempting to inject the #GP, and crashed the domain.= >=20 Do you have a link how to add one? We can put a catch-stacktrace-abort on it. > ~Andrew >=20 >>> Thanks, Ian. >>> >>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote: >>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patc= hes >>>> applied) and Xen 4.4.1 >>>> >>>> I originally posted a bug report with Debian but got the suggestion = to >>>> file bugs with upstream as well. >>>> Debian bug report: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D799480 >>>> >>>> Note that my original thought was that this bug probably is within G= RUB. >>>> But Ian asked me to file a bug with Xen as well, you have to live wi= th >>>> the >>>> fact that it is centered around GRUB though. >>>> >>>> Here's the information from my original bug report: >>>> >>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometim= es >>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100= % >>>> of the time. >>>> >>>> My understanding of the process: >>>> >>>> * dom0 launches domU with grub that is loaded from dom0's disk. >>>> * Grub reads config file from memdisk, and then looks for grub bina= ry in >>>> domU filesystem. >>>> * If grub is found in domU it then chainloads (multiboot) that grub= >>>> binary >>>> and the domU grub reads grub.cfg and continue booting. >>>> * If grub is not found in domU it reads grub.cfg and continues with= >>>> boot. >>>> >>>> It fails at step 3 in my list of the boot process, but sometimes it >>>> does work so it may be something like a race condition that causes t= he >>>> problem? >>>> >>>> A workaround is to not install or rename /boot/xen in domU so that t= he >>>> first grub that is loaded from dom0's disk will not find the grub >>>> binary in the domU filesystem and hence continues to read grub.cfg a= nd >>>> boot. The drawback of this is of course that the two versions can't >>>> differ too much as there are different setups creating grub.cfg and >>>> then reading/parsing it at boot time. >>>> >>>> I am not sure at this point whether this is a problem in XEN or a >>>> problem in grub but I compiled the legacy pvgrub that uses some mini= os >>>> from XEN (don't really know much more about it) and when that legacy= >>>> pvgrub chainloads the domU grub it seems to work 100% of the time. N= ow >>>> the legace pvgrub is not a real alternative as it's not packaged for= >>>> Debian though. >>>> >>>> When it fails "xl create vm -c" outputs this: >>>> Parsing config from /etc/xen/vm >>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domai= n >>>> type for domid=3D16 >>>> Unable to attach console >>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: consol= e >>>> child [0] exited with error status 1 >>>> >>>> And "xl dmesg" shows errors like this: >>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from >>>> 0x0000000000000000 to 0x000000000000ffff. >>>> (XEN) d16:v0: unhandled page fault (ec=3D0010) >>>> (XEN) Pagetable walk from 0000000000000000: >>>> (XEN) L4[0x000] =3D 0000000200256027 000000000000049c >>>> (XEN) L3[0x000] =3D 0000000200255027 000000000000049d >>>> (XEN) L2[0x000] =3D 0000000200251023 00000000000004a1 >>>> (XEN) L1[0x000] =3D 0000000000000000 ffffffffffffffff >>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021fe= b0 >>>> compat_create_bounce_frame+0xc6/0xde >>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0: >>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=3Dn Not tainted ]---- >>>> (XEN) CPU: 0 >>>> (XEN) RIP: e019:[<0000000000000000>] >>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest >>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000= 000 >>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800= 000 >>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 00000000000000= 00 >>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b90= 00 >>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea= 940 >>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 0000000000050= 6f0 >>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000 >>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019 >>>> (XEN) Guest stack trace from esp=3D005a5ff0: >>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389= >>>> 0016b388 >>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381= >>>> 0016b380 >>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379= >>>> 0016b378 >>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371= >>>> 0016b370 >>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369= >>>> 0016b368 >>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361= >>>> 0016b360 >>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359= >>>> 0016b358 >>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351= >>>> 0016b350 >>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349= >>>> 0016b348 >>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341= >>>> 0016b340 >>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339= >>>> 0016b338 >>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331= >>>> 0016b330 >>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329= >>>> 0016b328 >>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321= >>>> 0016b320 >>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319= >>>> 0016b318 >>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311= >>>> 0016b310 >>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309= >>>> 0016b308 >>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301= >>>> 0016b300 >>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9= >>>> 0016b2f8 >>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1= >>>> 0016b2f0 >>>> >>>> An easy way to find out which grub you are in if the machine boots i= s >>>> to hit 'c' and type 'ls', only the grub from dom0 will know about >>>> (memdisk). So when trying to replicate the issue (and the domU >>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and >>>> then type 'halt' and relaunch the domU. Usually I can't launch more >>>> than 4-5 times in a row before it fails, often it fails on my first >>>> try. >>>> >>>> For information I have reproduced on two different AMD desktop >>>> processor machines, not sure if Intel would be any different. I'm >>>> pretty sure I did tests with grub from unstable with same result at >>>> some point, but can test again if that is likely to work. >>>> >>>> The package that is in installed on the domU side is "grub-xen". >>>> >>>> I am unable to understand how to debug grub further on my own, I hav= e >>>> printed out text from grub so that I understood that it is the >>>> chainload that fails. I see no output from the domU grub (except whe= n >>>> it works as it should of course). I can help with further testing if= >>>> needed. >>>> >>>> /Andreas >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >=20 --NiAPK1EO5piikBI2rnGr819wuLJIulk1a Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREKAAYFAlaiKdoACgkQmBXlbbo5nOvslgD/Sc6Zd9U9+yIOIUw+FjdKJIp3 gk6tfn2XxY+mLTmPXQgA/1IHUho0RaVlbyWIJFJ5EN7XxZAyQNvxo7mNGaIqfC/z =w8Rf -----END PGP SIGNATURE----- --NiAPK1EO5piikBI2rnGr819wuLJIulk1a--