From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefan Bader <stefan.bader@canonical.com>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL
 load
Date: Wed, 17 Oct 2012 15:45:56 +0200
Message-ID: <507EB694.8070105@canonical.com>
References: <1350479456-4007-1-git-send-email-stefan.bader@canonical.com>
	<507EB27D.8050308@citrix.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============3687563582694962846=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <507EB27D.8050308@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--===============3687563582694962846==
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature";
 boundary="------------enig81D271FA730DEC5D8D196885"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig81D271FA730DEC5D8D196885
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 17.10.2012 15:28, Andrew Cooper wrote:
> On 17/10/12 14:10, Stefan Bader wrote:
>> I am currently looking at a bug report[1] which is happening when
>> a Xen PVM guest with multiple VCPUs is running a high IO database
>> load (a test script is available in the bug report).
>>
>> In experimenting it seems that this happens (or is getting more
>> likely) when the number of VCPUs is 8 or higher (though I have
>> not tried 6, only 2 and 4), having autogroup enabled seems to
>> make it more likely, too (at some point thought it would actually
>> prevent it but we were wrong) and pv spinlocks enabled.
>> It has happened with older (3.4.3) and newer (4.1.2) versions of
>> Xen as a host and with 3.2 and 3.5 kernels as guests.
>>
>> The pv spinlock assumption I will try to get re-verified by asking
>> to reproduce under a real load with a kernel that just disables
>> that. However, the dumps I am looking at really do look odd there.
>>
>> The first dump I looked at had the spinlock of runqueue[0] being
>> placed into the per-cpu lock_spinners variable for cpu#0 and
>> cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were
>> waiting on the slow path for it to become free. Though actually
>> it was free! Now, here is one issue I have in understanding the
>> dump: the back traces produced in crash are in the normal form
>> not showing any cpu in the poll_irq HV call. Only when using
>> the form that uses the last known stack location and displays
>> all text symols found will get close for cpu#7. cpu#0 still does
>> not seem to be anywhere close. This could be a problem with crash,
>> or with the way PVM works, I am not sure.
>>
>> Anyway, from what I could take from that situation, it seemed that
>> cpu#1 (that one had soft lockup warnings in the log) seemed to try
>> to do a wake_up on the task that just seemed to have done an io
>> schedule on cpu#7 (but the waitqueue_head spinlock is currently
>> locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle
>> migration (though the lock currently is free). So I guessed that
>> maybe cpu#0 was just woken for the free lock but has not grabbed
>> it yet.
>>
>> From there I wondered whether something that acquires a spinlock
>> usually more than the quick path timeout (and this causes new
>> lockers to go into the slow path), could cause a stall on a high
>> numbered cpu when the lock is contented (as the search and
>> signalling is only done for the first waiter starting from 0).
>> That lead to below patch. The good thing about it, it does not
>> break things immediately, the bad thing, it does not help with
>> the problem. :/
>>
>> The interesting thing when looking at a second dump, taken with
>> a testkernel using the patch below, was that again 2 cpus seemed
>> to spin slow on a lock that was free in the dump. And again at
>> least one did not seem to be doing anything spinlock related
>> (anymore?).
>> One other detail (but that could be just incidence as well) was
>> that in unmodified kernel I usually would end up with soft
>> lockup messages, with the patched kernel, that was a stall
>> detected by rcu_bh (0 and 1 stalled, detected by 3).
>>
>> Now I am a bit confused and wonder about some basic things:
>> 1. When a cpu goes into the slow lock path and into the poll_irq,
>>    shouldn't I expect this one to be in hypercall_page in the
>>    dump?
>> 2. When does the whole handling for interrupted spinlock wait
>>    apply? I assume only for a spinlock taken without irqs
>>    disabled and then trying to acquire another one in an
>>    interrupt handler. Is that correct?
>> 3. Not really related but I just stumbled over it:
>>    In xen_spin_trylock: "asm("xchgb %b0,%1"...)
>>    What does the "b" part of %b0 do? I thought xchgb already
>>    would say it is a byte value...
>>
>> But putting aside those questions, the fact that two times
>> there was two cpus waiting on the same lock which from the
>> lock count (=3D0) was free seems a bit too much of a coincidence.
>> Oh and the spinners count in the lock was 2 as one would
>> expect.
>>
>> struct rq {
>>   lock =3D {
>>     raw_lock =3D {
>>       {
>>         head_tail =3D 512,=20
>>         tickets =3D {
>>           head =3D 0 '\000',=20
>>           tail =3D 2 '\002'
>>         }
>>       }
>>     }
>>   },=20
>>   nr_running =3D 1,
>>   ...
>> }
>>
>> I really don't know how this happens. Especially cpu#0 seems at
>> least past the wakeup and should have removed itself from the
>> spinners list...
>> =20
>> -Stefan
>=20
> We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have
> never managed to get to the bottom of (because of inability to reliably=

> reproduce), which might be related.
>=20
> In our case, certain processes were locking up, and it turned out that
> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
> your launchpad ticket) on its own spinlock IPI event channel
> (understandable, as its a spinlock), but with the event channel masked,=

> so it would never wake up from the poll.

Unfortunately either crash looking at it or something else is wrong for t=
he PV
guest as looking at HYPERVISOR_shared_info appears completely blank.

>=20
> Can you reliably reproduce the issue? and if so, would it be possible t=
o
> capture hypervisor debug keys q and e when the badness happens?  This
> would clearly show if our bugs are related or not

Yes, as mentioned the python script in the bug report does that when meet=
ing the
other conditions. I certainly can do the debug keys on that.

>=20
> Our repro case took about a month to reproduce, and unfortunately
> turning lockdep on has appeared to fix the problem.

Yeah, having lockdep enabled was another thing that made it less likely. =
Though
some reporters had lockups with it, too. Just much less likely.

-Stefan

>=20
> ~Andrew
>=20
>>
>> [1] http://bugs.launchpad.net/bugs/1011792
>>


--------------enig81D271FA730DEC5D8D196885
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQIcBAEBCgAGBQJQfraVAAoJEOhnXe7L7s6jtCQP/1Li9oKebZMBeK1os42nAu0v
uOeizVRkV4Brl0hSTjXRJnzPWXnOAho5mKU429m0VJrxLeo8uHHqMApTJBF8YEFF
Nf+NkEg18khpLn08jlFt6Gz2aKMYi4zoRr8nBtIzwrw21bOSlOzMaT2EO39UtHvm
PX3HkMZWfkvEp/FTNBjynksOTMCp8W/dGXCMjA4h58fUPujl4IYjWhIUwvsI++Rc
m61PUi/LeWl0eE8xBYtp/piJwYkzl8/dLkyHo+qu0HlQ4jrQVYKsv5w+f0hfP1sY
uWAtGDFWALCJY8QYVqM81QiS3mPEWbG9B3pnJxarfSZx9IED62Ea4rXh3Vdm3xKc
Hk22e2abi/NSq81EUdbJMaH4Y0luoIiVeb5Pb76gqyDT1usMmE113US3e6JiDdfL
Nyu+xbaDUZMPK6VfVP3e+ZgdMwrYEwnR++NM+VYqsuKf5fLtBzNeGJ0JDN6c4wKB
O6pX/GDfdjX7soABFaQarfJWp9HX9LHE6BZF+SWYDuKdvIoKW8H2I9Hnh6qq6ALW
1I+LEdq6bn1N5cuGr6FxC+crAoEGhKCB5kjQ51x1ZNBo8f132DlctAsL2BLar4Jk
JK2F1B/pkVh6IW8wzJfjvQw+s9S/9pqFTSRPdRQf1sTanH1w5leLyuBASYqYUhpy
xUA3qXiwmqchNpek8NxS
=kZrl
-----END PGP SIGNATURE-----

--------------enig81D271FA730DEC5D8D196885--


--===============3687563582694962846==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============3687563582694962846==--