From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: Re: [SOLVED] 2.6.32 stuck in flush_tlb_others_ipi() Date: Thu, 19 Apr 2012 15:42:31 +0200 Message-ID: <201204191542.37295.hahn@univention.de> References: <201201091241.45281.hahn@univention.de> <201203301944.55376.hahn@univention.de> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2316989.MaVEEc0t0y"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit To: kvm Return-path: Received: from mail.univention.de ([82.198.197.8]:1539 "EHLO mail.univention.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754121Ab2DSNmt (ORCPT ); Thu, 19 Apr 2012 09:42:49 -0400 Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id E3348769110 for ; Thu, 19 Apr 2012 15:42:47 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id D435A769114 for ; Thu, 19 Apr 2012 15:42:47 +0200 (CEST) Received: from mail.univention.de ([127.0.0.1]) by localhost (slugis.knut.univention.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m7-sViqnNOHV for ; Thu, 19 Apr 2012 15:42:41 +0200 (CEST) Received: from stave.knut.univention.de (mail.univention.de [82.198.197.8]) by slugis.knut.univention.de (Postfix) with ESMTPSA id 8AEA0769110 for ; Thu, 19 Apr 2012 15:42:41 +0200 (CEST) In-Reply-To: <201203301944.55376.hahn@univention.de> Sender: kvm-owner@vger.kernel.org List-ID: --nextPart2316989.MaVEEc0t0y Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hello, good news: On Friday 30 March 2012 19:44:50 you wrote: > On Monday 09 January 2012 12:41:41 Philipp Hahn wrote: > > one of our VMs regularly get stuck: the VM is completely unresponsive (= no > > ssh, no serial console, no VNC). Using "gdbserver" and a remote system = to > > debug the running VM, I see 3 CPUs (1,3,4) stuck in > > pgd_alloc() =E2=86=92 spin_lock_irqsave(pgd_lock) > > while the 4th CPU (2) is waiting in > > pgd_alloc() =E2=86=92 pgd_prepopulate_pmb() =E2=86=92... =E2=86=92 fl= ush_tlb_others_ipi() > > > > 195 while > > (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196 > > cpu_relax(); > > (gdb) print f->flush_cpumask > > $5 =3D {1} > > > > CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork() > > syscall according to "thread apply all backtrace". It'a guest kernel bug already fixed in v2.6.38 [1], but not (yet) back-port= ed=20 to 2.6.32-longterm. [2] fixed a bug with TLB flushing when using PAE, which= =20 made the hidden bug trigger a lot more often. It only happens when using a= =20 PAE enabled guest kernel with >=3D2 CPUs. =46ull details are in our German Bugzilla [3]. [1]=20 [2]=20 [3] Sincerely Philipp =2D-=20 Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH be open. fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ --nextPart2316989.MaVEEc0t0y Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEABECAAYFAk+QFkgACgkQYPlgoZpUDjlV8ACeNILHBESQfdlMvXfFTMVjKQT9 gSMAn1PKKXQWgkU9k/xv98/WNU7g7a4A =EULE -----END PGP SIGNATURE----- --nextPart2316989.MaVEEc0t0y--