From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load Date: Wed, 17 Oct 2012 18:27:29 +0200 Message-ID: <507EDC71.4040400@canonical.com> References: <1350479456-4007-1-git-send-email-stefan.bader@canonical.com> <507EB27D.8050308@citrix.com> <1350482118.2460.74.camel@zakaz.uk.xensource.com> <507ECD06.2050407@canonical.com> <507ED038.8000806@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2837620664345663613==" Return-path: In-Reply-To: <507ED038.8000806@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: "xen-devel@lists.xen.org" , Ian Campbell , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============2837620664345663613== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig2567AA1A474DF249DA5F091A" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig2567AA1A474DF249DA5F091A Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 17.10.2012 17:35, Andrew Cooper wrote: >> (XEN) Event channel information for domain 1: >> (XEN) Polling vCPUs: {1,4,6} >> (XEN) port [p/m] >> (XEN) 4 [1/1]: s=3D6 n=3D0 x=3D0 >> (XEN) 10 [0/1]: s=3D6 n=3D1 x=3D0 >> (XEN) 28 [0/1]: s=3D6 n=3D4 x=3D0 >> (XEN) 40 [0/1]: s=3D6 n=3D6 x=3D0 >> > s =3D state. 0 =3D free, 1 =3D reserved, 2 =3D unbound, 3 =3D inter-do= main, 4 =3D > pirq, 5 =3D virq, 6 =3D ipi > n =3D target vcpu id to notify > x =3D boolean indicating whether xen is a consumer of the event channel= or > not. >=20 > d =3D target domain (when appropriate) In this case, p is the target p= ort. >=20 Thanks (at least something learned today :)) One thing I noticed here, in= the event channel info above, pending is 0 for channel 10, 28 and 40 (and set= for 4 which is the spinlock ipi for cpu 0). But in the VCPU info below (another= unknown: has=3DT and F) it says upcall_pend for all of them. Unfortunatel= y that might just mean that things change... >> (XEN) VCPU0: CPU3 [has=3DT] flags=3D0 poll=3D0 upcall_pend =3D 01, upc= all_mask > =3D 01 >> dirty_cpus=3D{3} cpu_affinity=3D{0-127} >> (XEN) No periodic timer >> (XEN) VCPU1: CPU7 [has=3DF] flags=3D1 poll=3D10 upcall_pend =3D 01, > upcall_mask =3D 01 >> dirty_cpus=3D{} cpu_affinity=3D{0-127} >> (XEN) No periodic timer >> (XEN) VCPU4: CPU6 [has=3DF] flags=3D1 poll=3D28 upcall_pend =3D 01, > upcall_mask =3D 01 >> dirty_cpus=3D{} cpu_affinity=3D{0-127} >> (XEN) No periodic timer >> (XEN) VCPU6: CPU0 [has=3DF] flags=3D1 poll=3D40 upcall_pend =3D 01, > upcall_mask =3D 01 >> dirty_cpus=3D{} cpu_affinity=3D{0-127} >> (XEN) No periodic timer >=20 > So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event= > channel for itself. >=20 > Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60. >=20 >=20 > I wonder if there is possibly a race condition between notifying that a= > lock has been unlocked, and another vcpu trying to poll after deciding > that the lock is locked. There has to be something somehwere, I just cannot spot it. The unlocking= cpu will do a wmb() before setting the lock to 0, then a mb() and then check = for spinners. When failing the quick pack a locker will first set the lockspi= nner entry, then do a wmb() and increment the spinners count. After that it cl= ears the event pending and then checks lock again before actually going into p= oll. >=20 > The other option is that there is a bug in working out which event > channel to notify when a lock is unlocked. I had thought I saw one thing that I tried to fix with my patch. Another = train of thought would have been any other cpu grabbing the lock always as soon= as it gets released and so preventing any cpu in poll from success. But that wo= uld then show the lock as locked... >=20 > ~Andrew >=20 >> >> >> Backtraces would be somewhat inconsistent (as always). Note, I should > mention >> that I still had a kernel with my patch applied on that guest. That > changes >> things a bit (actually it takes a bit longer to hang but again that > might be >> just a matter of timing). The strange lock state of 2 spinners on an > unlocked >> lock remains the same with or without it. >> >> One question about the patch actually, would anybody think that there > could be a >> case where the unlocking cpu has itself on the spinners list? I did > not think so >> but that might be wrong. >>> >>> The IRQ handler for the spinlock evtchn in Linux is: >>> static irqreturn_t dummy_handler(int irq, void *dev_id) >>> { >>> BUG(); >>> return IRQ_HANDLED; >>> } >>> >>> and right after we register it: >>> disable_irq(irq); /* make sure it's never delivered */ >>> >>> The is no enable -- ignoring bugs of which there have been couple of >>> instances, but those trigger the BUG() so are pretty obvious. >>> >>> Ian. >>> >>> >> >> >> >> >=20 >=20 >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >=20 --------------enig2567AA1A474DF249DA5F091A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQIcBAEBCgAGBQJQftxyAAoJEOhnXe7L7s6j3vsQAMcmgY1Ebao/FjiQQZhub8+y 5LLEOO/DPPlmUICssoz9IKrN4ZOoPGGKSlsuWVIALvbl+DgT8wBLISEAvtcAcXnn k7JknLkO49vkgJgy0TstSn4jM3xo5HbyubaALQDrPXawVqFOz133t02b0LLDD3B5 tZNqYxxWSByX/XYi1UJCQ1LeUdUqyK8luxtZsVL9KP2lTsYxfQOD7+QZ90fReALe fvht5N/Dm1RFPkcGdirbP9JlutjbfSsbPHmR5FvHyJ6eYw46tiIEcGXnp8cYOh6y pGi1SAAg/c2zOAytPibut2CDdpqwJVX7VzrJsjhj2lnVwvge6ADJS6tr7wgMMj5V 7J/c1QeHj+u0Dkgz+UutgrHcXHyRwv2iLDq6PppjY5VjbPZlYU6Mhs8E3bzmZzBT t1LjGDj4LcoPtH2I3XEr4I5ivClFcLlvEDAVQ45iYy1fp4NxBykc0A0/TgwQ6NIS E+JSd4DZM5MDlftue4AelKb9/4lE6G/V+9o4SSSPDcDhJCKZO9pSDTUty9PHFwti 96XCF0o2F2qH7aOI1UwFfimHGViZshHB0hvxtanoQPw0kulpxhKDUu1Q8CPnBTlm 7zuKEHtNjf+hoJs62dPPNx+EfgDXS4JeUj54IoMrPoPMBljDRW+q1aQ1WdHa9AMT GmjYaVKtKTd7P05P/kaa =Wd+1 -----END PGP SIGNATURE----- --------------enig2567AA1A474DF249DA5F091A-- --===============2837620664345663613== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============2837620664345663613==--