From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load Date: Thu, 18 Oct 2012 22:52:27 +0200 Message-ID: <50806C0B.1060504@canonical.com> References: <1350479456-4007-1-git-send-email-stefan.bader@canonical.com> <507EB27D.8050308@citrix.com> <1350482118.2460.74.camel@zakaz.uk.xensource.com> <507ECD06.2050407@canonical.com> <507ED038.8000806@citrix.com> <507FC51102000078000A235E@nat28.tlf.novell.com> <507FC71502000078000A236C@nat28.tlf.novell.com> <507FB1E1.8080700@canonical.com> <1350546483.28188.25.camel@dagon.hellion.org.uk> <507FD7DE.2010209@canonical.com> <507FFA5102000078000A250D@nat28.tlf.novell.com> <507FF964.9090009@canonical.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2958323107161820582==" Return-path: In-Reply-To: <507FF964.9090009@canonical.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Andrew Cooper , "xen-devel@lists.xen.org" , Ian Campbell , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============2958323107161820582== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig07495305BB5B4065E9069FF1" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig07495305BB5B4065E9069FF1 Content-Type: multipart/mixed; boundary="------------000604000200030200050803" This is a multi-part message in MIME format. --------------000604000200030200050803 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 18.10.2012 14:43, Stefan Bader wrote: >> Obviously when this is an acquire not disabling interrupts, and >> an interrupt comes in while in the poll hypercall (or about to go >> there, or just having come back from one). >> >> Jan >> > Obviously. ;) Ok, so my thinking there was ok and its one level deep ma= x. At > some point staring at things I start question my sanity. > A wild thinking would be whether in that case the interrupted spinlock = may miss > a wakeup forever when the unlocker only can check for the toplevel. Hm,= but that > should be easy to rule out by just adding an error to spin_unlock_slow = when it > fails to find anything... >=20 Actually I begin to suspect that it could be possible that I just overloo= ked the most obvious thing. Provoking question: are we sure we are on the same pa= ge about the purpose of the spin_lock_flags variant of the pv lock ops inter= face? I begin to suspect that it really is not for giving a chance to re-enable= interrupts. Just what it should be used for I am not clear. Anyway it see= ms all other places more or less ignore the flags and map themselves back to an ignorant version of spinlock. Also I believe that the only high level function that would end up in pas= sing any flags, would be the spin_lock_irqsave one. And I am pretty sure that = this one will expect interrupts to stay disabled. So I tried below approach and that seems to be surviving the previously b= reaking testcase for much longer than anything I tried before. -Stefan =46rom f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Thu, 18 Oct 2012 21:40:37 +0200 Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in xen_spin_lock_slow() I am not sure what exactly the spin_lock_flags variant of the pv-spinlocks (or even in the arch spinlocks) should be used for. But it should not be used as an invitation to enable irqs. The only high-level variant that seems to end up there is the spin_lock_irqsave one and that would always be used in a context that expects the interrupts to be disabled. The generic paravirt-spinlock code just maps the flags variant to the one without flags, so just do the same and get rid of all the stuff that is not needed anymore. This seems to be resolving a weird locking issue seen when having a high i/o database load on a PV Xen guest with multiple (8+ in local experiments) CPUs. Well, thinking about it a second time it seems like one of those "how did that ever work?" cases. Signed-off-by: Stefan Bader --- arch/x86/xen/spinlock.c | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 83e866d..3330a1d 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -24,7 +24,6 @@ static struct xen_spinlock_stats u32 taken_slow_nested; u32 taken_slow_pickup; u32 taken_slow_spurious; - u32 taken_slow_irqenable; u64 released; u32 released_slow; @@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinloc= k *xl, struct xen_spinlock __this_cpu_write(lock_spinners, prev); } -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool = irq_enable) +static noinline int xen_spin_lock_slow(struct arch_spinlock *lock) { struct xen_spinlock *xl =3D (struct xen_spinlock *)lock; struct xen_spinlock *prev; @@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_sp= inlock *lock, bool irq_enab ADD_STATS(taken_slow_nested, prev !=3D NULL); do { - unsigned long flags; - /* clear pending */ xen_clear_irq_pending(irq); @@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_s= pinlock *lock, bool irq_enab goto out; } - flags =3D arch_local_save_flags(); - if (irq_enable) { - ADD_STATS(taken_slow_irqenable, 1); - raw_local_irq_enable(); - } - /* * Block until irq becomes pending. If we're * interrupted at this point (after the trylock but @@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_sp= inlock *lock, bool irq_enab */ xen_poll_irq(irq); - raw_local_irq_restore(flags); - ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq)); } while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */ @@ -270,7 +259,7 @@ out: return ret; } -static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_= enable) +static inline void __xen_spin_lock(struct arch_spinlock *lock) { struct xen_spinlock *xl =3D (struct xen_spinlock *)lock; unsigned timeout; @@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spin= lock *lock, bool irq_enable) spin_time_accum_spinning(start_spin_fast); } while (unlikely(oldval !=3D 0 && - (TIMEOUT =3D=3D ~0 || !xen_spin_lock_slow(lock, irq_enable)))); + (TIMEOUT =3D=3D ~0 || !xen_spin_lock_slow(lock)))); spin_time_accum_total(start_spin); } static void xen_spin_lock(struct arch_spinlock *lock) { - __xen_spin_lock(lock, false); + __xen_spin_lock(lock); } static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned lon= g flags) { - __xen_spin_lock(lock, !raw_irqs_disabled_flags(flags)); + __xen_spin_lock(lock); } static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) @@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void) &spinlock_stats.taken_slow_pickup); debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug, &spinlock_stats.taken_slow_spurious); - debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug, - &spinlock_stats.taken_slow_irqenable); debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.rele= ased); debugfs_create_u32("released_slow", 0444, d_spin_debug, --=20 1.7.9.5 --------------000604000200030200050803 Content-Type: text/x-diff; name="0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.pa"; filename*1="tch" =46rom f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Thu, 18 Oct 2012 21:40:37 +0200 Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in xen_spin_lock_slow() I am not sure what exactly the spin_lock_flags variant of the pv-spinlocks (or even in the arch spinlocks) should be used for. But it should not be used as an invitation to enable irqs. The only high-level variant that seems to end up there is the spinlock_irqsave one and that would always be used in a context that expects the interrupts to be disabled. The generic paravirt-spinlock code just maps the flags variant to the one without flags, so just do the same and get rid of all the stuff that is not needed anymore. This seems to be resolving a weird locking issue seen when having a high i/o database load on a PV Xen guest with multiple (8+ in local experiments) CPUs. Well, thinking about it a second time it seems like one of those "how did that ever work?" cases. Signed-off-by: Stefan Bader --- arch/x86/xen/spinlock.c | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 83e866d..3330a1d 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -24,7 +24,6 @@ static struct xen_spinlock_stats u32 taken_slow_nested; u32 taken_slow_pickup; u32 taken_slow_spurious; - u32 taken_slow_irqenable; =20 u64 released; u32 released_slow; @@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinloc= k *xl, struct xen_spinlock __this_cpu_write(lock_spinners, prev); } =20 -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool = irq_enable) +static noinline int xen_spin_lock_slow(struct arch_spinlock *lock) { struct xen_spinlock *xl =3D (struct xen_spinlock *)lock; struct xen_spinlock *prev; @@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_sp= inlock *lock, bool irq_enab ADD_STATS(taken_slow_nested, prev !=3D NULL); =20 do { - unsigned long flags; - /* clear pending */ xen_clear_irq_pending(irq); =20 @@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_s= pinlock *lock, bool irq_enab goto out; } =20 - flags =3D arch_local_save_flags(); - if (irq_enable) { - ADD_STATS(taken_slow_irqenable, 1); - raw_local_irq_enable(); - } - /* * Block until irq becomes pending. If we're * interrupted at this point (after the trylock but @@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_sp= inlock *lock, bool irq_enab */ xen_poll_irq(irq); =20 - raw_local_irq_restore(flags); - ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq)); } while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */ =20 @@ -270,7 +259,7 @@ out: return ret; } =20 -static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_= enable) +static inline void __xen_spin_lock(struct arch_spinlock *lock) { struct xen_spinlock *xl =3D (struct xen_spinlock *)lock; unsigned timeout; @@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spin= lock *lock, bool irq_enable) spin_time_accum_spinning(start_spin_fast); =20 } while (unlikely(oldval !=3D 0 && - (TIMEOUT =3D=3D ~0 || !xen_spin_lock_slow(lock, irq_enable)))); + (TIMEOUT =3D=3D ~0 || !xen_spin_lock_slow(lock)))); =20 spin_time_accum_total(start_spin); } =20 static void xen_spin_lock(struct arch_spinlock *lock) { - __xen_spin_lock(lock, false); + __xen_spin_lock(lock); } =20 static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned lon= g flags) { - __xen_spin_lock(lock, !raw_irqs_disabled_flags(flags)); + __xen_spin_lock(lock); } =20 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) @@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void) &spinlock_stats.taken_slow_pickup); debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug, &spinlock_stats.taken_slow_spurious); - debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug, - &spinlock_stats.taken_slow_irqenable); =20 debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.rele= ased); debugfs_create_u32("released_slow", 0444, d_spin_debug, --=20 1.7.9.5 --------------000604000200030200050803-- --------------enig07495305BB5B4065E9069FF1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQIcBAEBCgAGBQJQgGwLAAoJEOhnXe7L7s6jZjwP/jiw4qycURVgcb1Spwn4dW6b Pyn6WoT014e6Fj4V9n24esDOYexjg8ORFf77trqi7qpp04KAxp7aAyTMQca6FAcW GpVSuOPKkEaS0qexHvUZVChBmUz+47oSf4jdrhsowHAd+qWHIzLt5qN/NBFJArjf 3F4lAvqniqheHmp0TqW1oVyt2R82cFU0Ezik/mquABlUrE4kXIEJMaoJtTL93WsU WhgRkdspJAHGEqN0VRXEd+uiVPkJTgjyZh+z8PBaAPhznCzlhuO1F+LUWAEANLzW lmMq7iM7+SdkKupWYkPNuHJHOjlgn2Hm4lEhnXTCqQBQJzUmjSSk5fW4x9oYoOqu Sy7RUEO8vFMJ9GRYzvhDCTHekXe1uWJ38JEoPZ2JFhL4f24OOk+ClGAIE0hh0If3 CdIvaNJoPxmCrEG4TQfLyEb9LWsSV1jDgOyYp4pHXPClLCaw/ceXk0cof2Phxu49 7OsuXcVrfrboQ1Pb0Mweqc/N+KAqIyxYFvNTcmmwTGBeX9Aj1CsVzF3TD49uZnb+ A6n7W56boqMt8Mg+u6FCkZH1a3sE2oH45YmlTU6YQ+WRGeZcLFx/mi8VQHsxB7zB PG1oCS12+GltlJjZhedUFonaIIlLrpUt2effedpyLPoPX9BJhBJYW/JOheViaXUa yC3XC3ehlhefqTd8ymmF =OYGm -----END PGP SIGNATURE----- --------------enig07495305BB5B4065E9069FF1-- --===============2958323107161820582== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============2958323107161820582==--