From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load Date: Wed, 17 Oct 2012 15:45:56 +0200 Message-ID: <507EB694.8070105@canonical.com> References: <1350479456-4007-1-git-send-email-stefan.bader@canonical.com> <507EB27D.8050308@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3687563582694962846==" Return-path: In-Reply-To: <507EB27D.8050308@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Konrad Rzeszutek Wilk , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============3687563582694962846== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig81D271FA730DEC5D8D196885" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig81D271FA730DEC5D8D196885 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 17.10.2012 15:28, Andrew Cooper wrote: > On 17/10/12 14:10, Stefan Bader wrote: >> I am currently looking at a bug report[1] which is happening when >> a Xen PVM guest with multiple VCPUs is running a high IO database >> load (a test script is available in the bug report). >> >> In experimenting it seems that this happens (or is getting more >> likely) when the number of VCPUs is 8 or higher (though I have >> not tried 6, only 2 and 4), having autogroup enabled seems to >> make it more likely, too (at some point thought it would actually >> prevent it but we were wrong) and pv spinlocks enabled. >> It has happened with older (3.4.3) and newer (4.1.2) versions of >> Xen as a host and with 3.2 and 3.5 kernels as guests. >> >> The pv spinlock assumption I will try to get re-verified by asking >> to reproduce under a real load with a kernel that just disables >> that. However, the dumps I am looking at really do look odd there. >> >> The first dump I looked at had the spinlock of runqueue[0] being >> placed into the per-cpu lock_spinners variable for cpu#0 and >> cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were >> waiting on the slow path for it to become free. Though actually >> it was free! Now, here is one issue I have in understanding the >> dump: the back traces produced in crash are in the normal form >> not showing any cpu in the poll_irq HV call. Only when using >> the form that uses the last known stack location and displays >> all text symols found will get close for cpu#7. cpu#0 still does >> not seem to be anywhere close. This could be a problem with crash, >> or with the way PVM works, I am not sure. >> >> Anyway, from what I could take from that situation, it seemed that >> cpu#1 (that one had soft lockup warnings in the log) seemed to try >> to do a wake_up on the task that just seemed to have done an io >> schedule on cpu#7 (but the waitqueue_head spinlock is currently >> locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle >> migration (though the lock currently is free). So I guessed that >> maybe cpu#0 was just woken for the free lock but has not grabbed >> it yet. >> >> From there I wondered whether something that acquires a spinlock >> usually more than the quick path timeout (and this causes new >> lockers to go into the slow path), could cause a stall on a high >> numbered cpu when the lock is contented (as the search and >> signalling is only done for the first waiter starting from 0). >> That lead to below patch. The good thing about it, it does not >> break things immediately, the bad thing, it does not help with >> the problem. :/ >> >> The interesting thing when looking at a second dump, taken with >> a testkernel using the patch below, was that again 2 cpus seemed >> to spin slow on a lock that was free in the dump. And again at >> least one did not seem to be doing anything spinlock related >> (anymore?). >> One other detail (but that could be just incidence as well) was >> that in unmodified kernel I usually would end up with soft >> lockup messages, with the patched kernel, that was a stall >> detected by rcu_bh (0 and 1 stalled, detected by 3). >> >> Now I am a bit confused and wonder about some basic things: >> 1. When a cpu goes into the slow lock path and into the poll_irq, >> shouldn't I expect this one to be in hypercall_page in the >> dump? >> 2. When does the whole handling for interrupted spinlock wait >> apply? I assume only for a spinlock taken without irqs >> disabled and then trying to acquire another one in an >> interrupt handler. Is that correct? >> 3. Not really related but I just stumbled over it: >> In xen_spin_trylock: "asm("xchgb %b0,%1"...) >> What does the "b" part of %b0 do? I thought xchgb already >> would say it is a byte value... >> >> But putting aside those questions, the fact that two times >> there was two cpus waiting on the same lock which from the >> lock count (=3D0) was free seems a bit too much of a coincidence. >> Oh and the spinners count in the lock was 2 as one would >> expect. >> >> struct rq { >> lock =3D { >> raw_lock =3D { >> { >> head_tail =3D 512,=20 >> tickets =3D { >> head =3D 0 '\000',=20 >> tail =3D 2 '\002' >> } >> } >> } >> },=20 >> nr_running =3D 1, >> ... >> } >> >> I really don't know how this happens. Especially cpu#0 seems at >> least past the wakeup and should have removed itself from the >> spinners list... >> =20 >> -Stefan >=20 > We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have > never managed to get to the bottom of (because of inability to reliably= > reproduce), which might be related. >=20 > In our case, certain processes were locking up, and it turned out that > the kernel was issuing SCHOP_poll hypercalls (same the stack trace on > your launchpad ticket) on its own spinlock IPI event channel > (understandable, as its a spinlock), but with the event channel masked,= > so it would never wake up from the poll. Unfortunately either crash looking at it or something else is wrong for t= he PV guest as looking at HYPERVISOR_shared_info appears completely blank. >=20 > Can you reliably reproduce the issue? and if so, would it be possible t= o > capture hypervisor debug keys q and e when the badness happens? This > would clearly show if our bugs are related or not Yes, as mentioned the python script in the bug report does that when meet= ing the other conditions. I certainly can do the debug keys on that. >=20 > Our repro case took about a month to reproduce, and unfortunately > turning lockdep on has appeared to fix the problem. Yeah, having lockdep enabled was another thing that made it less likely. = Though some reporters had lockups with it, too. Just much less likely. -Stefan >=20 > ~Andrew >=20 >> >> [1] http://bugs.launchpad.net/bugs/1011792 >> --------------enig81D271FA730DEC5D8D196885 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQIcBAEBCgAGBQJQfraVAAoJEOhnXe7L7s6jtCQP/1Li9oKebZMBeK1os42nAu0v uOeizVRkV4Brl0hSTjXRJnzPWXnOAho5mKU429m0VJrxLeo8uHHqMApTJBF8YEFF Nf+NkEg18khpLn08jlFt6Gz2aKMYi4zoRr8nBtIzwrw21bOSlOzMaT2EO39UtHvm PX3HkMZWfkvEp/FTNBjynksOTMCp8W/dGXCMjA4h58fUPujl4IYjWhIUwvsI++Rc m61PUi/LeWl0eE8xBYtp/piJwYkzl8/dLkyHo+qu0HlQ4jrQVYKsv5w+f0hfP1sY uWAtGDFWALCJY8QYVqM81QiS3mPEWbG9B3pnJxarfSZx9IED62Ea4rXh3Vdm3xKc Hk22e2abi/NSq81EUdbJMaH4Y0luoIiVeb5Pb76gqyDT1usMmE113US3e6JiDdfL Nyu+xbaDUZMPK6VfVP3e+ZgdMwrYEwnR++NM+VYqsuKf5fLtBzNeGJ0JDN6c4wKB O6pX/GDfdjX7soABFaQarfJWp9HX9LHE6BZF+SWYDuKdvIoKW8H2I9Hnh6qq6ALW 1I+LEdq6bn1N5cuGr6FxC+crAoEGhKCB5kjQ51x1ZNBo8f132DlctAsL2BLar4Jk JK2F1B/pkVh6IW8wzJfjvQw+s9S/9pqFTSRPdRQf1sTanH1w5leLyuBASYqYUhpy xUA3qXiwmqchNpek8NxS =kZrl -----END PGP SIGNATURE----- --------------enig81D271FA730DEC5D8D196885-- --===============3687563582694962846== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3687563582694962846==--