All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load
Date: Thu, 18 Oct 2012 22:52:27 +0200	[thread overview]
Message-ID: <50806C0B.1060504@canonical.com> (raw)
In-Reply-To: <507FF964.9090009@canonical.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 5934 bytes --]

On 18.10.2012 14:43, Stefan Bader wrote:
>> Obviously when this is an acquire not disabling interrupts, and
>> an interrupt comes in while in the poll hypercall (or about to go
>> there, or just having come back from one).
>>
>> Jan
>>
> Obviously. ;) Ok, so my thinking there was ok and its one level deep max. At
> some point staring at things I start question my sanity.
> A wild thinking would be whether in that case the interrupted spinlock may miss
> a wakeup forever when the unlocker only can check for the toplevel. Hm, but that
> should be easy to rule out by just adding an error to spin_unlock_slow when it
> fails to find anything...
> 
Actually I begin to suspect that it could be possible that I just overlooked the
most obvious thing. Provoking question: are we sure we are on the same page
about the purpose of the spin_lock_flags variant of the pv lock ops interface?

I begin to suspect that it really is not for giving a chance to re-enable
interrupts. Just what it should be used for I am not clear. Anyway it seems all
other places more or less ignore the flags and map themselves back to an
ignorant version of spinlock.
Also I believe that the only high level function that would end up in passing
any flags, would be the spin_lock_irqsave one. And I am pretty sure that this
one will expect interrupts to stay disabled.

So I tried below approach and that seems to be surviving the previously breaking
testcase for much longer than anything I tried before.

-Stefan

From f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Thu, 18 Oct 2012 21:40:37 +0200
Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in
 xen_spin_lock_slow()

I am not sure what exactly the spin_lock_flags variant of the
pv-spinlocks (or even in the arch spinlocks) should be used for.
But it should not be used as an invitation to enable irqs.

The only high-level variant that seems to end up there is the
spin_lock_irqsave one and that would always be used in a context
that expects the interrupts to be disabled.
The generic paravirt-spinlock code just maps the flags variant
to the one without flags, so just do the same and get rid of
all the stuff that is not needed anymore.

This seems to be resolving a weird locking issue seen when having
a high i/o database load on a PV Xen guest with multiple (8+ in
local experiments) CPUs. Well, thinking about it a second time
it seems like one of those "how did that ever work?" cases.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/spinlock.c |   23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83e866d..3330a1d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -24,7 +24,6 @@ static struct xen_spinlock_stats
 	u32 taken_slow_nested;
 	u32 taken_slow_pickup;
 	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;

 	u64 released;
 	u32 released_slow;
@@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl,
struct xen_spinlock
 	__this_cpu_write(lock_spinners, prev);
 }

-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
+static noinline int xen_spin_lock_slow(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	struct xen_spinlock *prev;
@@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 	ADD_STATS(taken_slow_nested, prev != NULL);

 	do {
-		unsigned long flags;
-
 		/* clear pending */
 		xen_clear_irq_pending(irq);

@@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 			goto out;
 		}

-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
-
 		/*
 		 * Block until irq becomes pending.  If we're
 		 * interrupted at this point (after the trylock but
@@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 		 */
 		xen_poll_irq(irq);

-		raw_local_irq_restore(flags);
-
 		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
 	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */

@@ -270,7 +259,7 @@ out:
 	return ret;
 }

-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
+static inline void __xen_spin_lock(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	unsigned timeout;
@@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock
*lock, bool irq_enable)
 		spin_time_accum_spinning(start_spin_fast);

 	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
+			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock))));

 	spin_time_accum_total(start_spin);
 }

 static void xen_spin_lock(struct arch_spinlock *lock)
 {
-	__xen_spin_lock(lock, false);
+	__xen_spin_lock(lock);
 }

 static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
 {
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
+	__xen_spin_lock(lock);
 }

 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
@@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void)
 			   &spinlock_stats.taken_slow_pickup);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
 			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);

 	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-- 
1.7.9.5


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch --]
[-- Type: text/x-diff; name="0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch", Size: 4418 bytes --]

From f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Thu, 18 Oct 2012 21:40:37 +0200
Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in
 xen_spin_lock_slow()

I am not sure what exactly the spin_lock_flags variant of the
pv-spinlocks (or even in the arch spinlocks) should be used for.
But it should not be used as an invitation to enable irqs.

The only high-level variant that seems to end up there is the
spinlock_irqsave one and that would always be used in a context
that expects the interrupts to be disabled.
The generic paravirt-spinlock code just maps the flags variant
to the one without flags, so just do the same and get rid of
all the stuff that is not needed anymore.

This seems to be resolving a weird locking issue seen when having
a high i/o database load on a PV Xen guest with multiple (8+ in
local experiments) CPUs. Well, thinking about it a second time
it seems like one of those "how did that ever work?" cases.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/spinlock.c |   23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83e866d..3330a1d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -24,7 +24,6 @@ static struct xen_spinlock_stats
 	u32 taken_slow_nested;
 	u32 taken_slow_pickup;
 	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;
 
 	u64 released;
 	u32 released_slow;
@@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock
 	__this_cpu_write(lock_spinners, prev);
 }
 
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
+static noinline int xen_spin_lock_slow(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	struct xen_spinlock *prev;
@@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 	ADD_STATS(taken_slow_nested, prev != NULL);
 
 	do {
-		unsigned long flags;
-
 		/* clear pending */
 		xen_clear_irq_pending(irq);
 
@@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 			goto out;
 		}
 
-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
-
 		/*
 		 * Block until irq becomes pending.  If we're
 		 * interrupted at this point (after the trylock but
@@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 		 */
 		xen_poll_irq(irq);
 
-		raw_local_irq_restore(flags);
-
 		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
 	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
 
@@ -270,7 +259,7 @@ out:
 	return ret;
 }
 
-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
+static inline void __xen_spin_lock(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	unsigned timeout;
@@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
 		spin_time_accum_spinning(start_spin_fast);
 
 	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
+			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock))));
 
 	spin_time_accum_total(start_spin);
 }
 
 static void xen_spin_lock(struct arch_spinlock *lock)
 {
-	__xen_spin_lock(lock, false);
+	__xen_spin_lock(lock);
 }
 
 static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
 {
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
+	__xen_spin_lock(lock);
 }
 
 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
@@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void)
 			   &spinlock_stats.taken_slow_pickup);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
 			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);
 
 	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-- 
1.7.9.5


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2012-10-18 20:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader [this message]
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50806C0B.1060504@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.