Xen PVM: Strange lockups when running PostgreSQL load

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Xen PVM: Strange lockups when running PostgreSQL load
@ 2012-10-17 13:10 Stefan Bader
  2012-10-17 13:28 ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-17 13:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Konrad Rzeszutek Wilk

I am currently looking at a bug report[1] which is happening when
a Xen PVM guest with multiple VCPUs is running a high IO database
load (a test script is available in the bug report).

In experimenting it seems that this happens (or is getting more
likely) when the number of VCPUs is 8 or higher (though I have
not tried 6, only 2 and 4), having autogroup enabled seems to
make it more likely, too (at some point thought it would actually
prevent it but we were wrong) and pv spinlocks enabled.
It has happened with older (3.4.3) and newer (4.1.2) versions of
Xen as a host and with 3.2 and 3.5 kernels as guests.

The pv spinlock assumption I will try to get re-verified by asking
to reproduce under a real load with a kernel that just disables
that. However, the dumps I am looking at really do look odd there.

The first dump I looked at had the spinlock of runqueue[0] being
placed into the per-cpu lock_spinners variable for cpu#0 and
cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were
waiting on the slow path for it to become free. Though actually
it was free! Now, here is one issue I have in understanding the
dump: the back traces produced in crash are in the normal form
not showing any cpu in the poll_irq HV call. Only when using
the form that uses the last known stack location and displays
all text symols found will get close for cpu#7. cpu#0 still does
not seem to be anywhere close. This could be a problem with crash,
or with the way PVM works, I am not sure.

Anyway, from what I could take from that situation, it seemed that
cpu#1 (that one had soft lockup warnings in the log) seemed to try
to do a wake_up on the task that just seemed to have done an io
schedule on cpu#7 (but the waitqueue_head spinlock is currently
locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle
migration (though the lock currently is free). So I guessed that
maybe cpu#0 was just woken for the free lock but has not grabbed
it yet.

>From there I wondered whether something that acquires a spinlock
usually more than the quick path timeout (and this causes new
lockers to go into the slow path), could cause a stall on a high
numbered cpu when the lock is contented (as the search and
signalling is only done for the first waiter starting from 0).
That lead to below patch. The good thing about it, it does not
break things immediately, the bad thing, it does not help with
the problem. :/

The interesting thing when looking at a second dump, taken with
a testkernel using the patch below, was that again 2 cpus seemed
to spin slow on a lock that was free in the dump. And again at
least one did not seem to be doing anything spinlock related
(anymore?).
One other detail (but that could be just incidence as well) was
that in unmodified kernel I usually would end up with soft
lockup messages, with the patched kernel, that was a stall
detected by rcu_bh (0 and 1 stalled, detected by 3).

Now I am a bit confused and wonder about some basic things:
1. When a cpu goes into the slow lock path and into the poll_irq,
   shouldn't I expect this one to be in hypercall_page in the
   dump?
2. When does the whole handling for interrupted spinlock wait
   apply? I assume only for a spinlock taken without irqs
   disabled and then trying to acquire another one in an
   interrupt handler. Is that correct?
3. Not really related but I just stumbled over it:
   In xen_spin_trylock: "asm("xchgb %b0,%1"...)
   What does the "b" part of %b0 do? I thought xchgb already
   would say it is a byte value...

But putting aside those questions, the fact that two times
there was two cpus waiting on the same lock which from the
lock count (=0) was free seems a bit too much of a coincidence.
Oh and the spinners count in the lock was 2 as one would
expect.

struct rq {
  lock = {
    raw_lock = {
      {
        head_tail = 512, 
        tickets = {
          head = 0 '\000', 
          tail = 2 '\002'
        }
      }
    }
  }, 
  nr_running = 1,
  ...
}

I really don't know how this happens. Especially cpu#0 seems at
least past the wakeup and should have removed itself from the
spinners list...

-Stefan

[1] http://bugs.launchpad.net/bugs/1011792

>From 635a4e101ccbc9a324c8000f7e264ed4e646592c Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Tue, 16 Oct 2012 17:46:09 +0200
Subject: [PATCH] xen/spinlocks: Make wakeup fairer

Instead of sending the IPI signalling the free lock to the first
online CPU found waiting for it, start the search from the CPU
that had the lock last.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/spinlock.c |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d69cc6c..8b86efb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -320,17 +320,23 @@ static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
 {
 	int cpu;
+	int this_cpu = smp_processor_id();

 	ADD_STATS(released_slow, 1);

-	for_each_online_cpu(cpu) {
-		/* XXX should mix up next cpu selection */
-		if (per_cpu(lock_spinners, cpu) == xl) {
-			ADD_STATS(released_slow_kicked, 1);
-			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
-			break;
-		}
-	}
+	cpu = cpumask_next(this_cpu, cpu_online_mask);
+	do {
+		if (cpu < nr_cpu_ids) {
+			if (per_cpu(lock_spinners, cpu) == xl) {
+				ADD_STATS(released_slow_kicked, 1);
+				xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+				break;
+			}
+		} else
+			cpu = -1;
+
+		cpu = cpumask_next(cpu, cpu_online_mask);
+	} while (cpu != this_cpu);
 }

 static void xen_spin_unlock(struct arch_spinlock *lock)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
@ 2012-10-17 13:28 ` Andrew Cooper
  2012-10-17 13:45   ` Stefan Bader
                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-10-17 13:28 UTC (permalink / raw)
  To: xen-devel, stefan.bader, Konrad Rzeszutek Wilk

On 17/10/12 14:10, Stefan Bader wrote:
> I am currently looking at a bug report[1] which is happening when
> a Xen PVM guest with multiple VCPUs is running a high IO database
> load (a test script is available in the bug report).
>
> In experimenting it seems that this happens (or is getting more
> likely) when the number of VCPUs is 8 or higher (though I have
> not tried 6, only 2 and 4), having autogroup enabled seems to
> make it more likely, too (at some point thought it would actually
> prevent it but we were wrong) and pv spinlocks enabled.
> It has happened with older (3.4.3) and newer (4.1.2) versions of
> Xen as a host and with 3.2 and 3.5 kernels as guests.
>
> The pv spinlock assumption I will try to get re-verified by asking
> to reproduce under a real load with a kernel that just disables
> that. However, the dumps I am looking at really do look odd there.
>
> The first dump I looked at had the spinlock of runqueue[0] being
> placed into the per-cpu lock_spinners variable for cpu#0 and
> cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were
> waiting on the slow path for it to become free. Though actually
> it was free! Now, here is one issue I have in understanding the
> dump: the back traces produced in crash are in the normal form
> not showing any cpu in the poll_irq HV call. Only when using
> the form that uses the last known stack location and displays
> all text symols found will get close for cpu#7. cpu#0 still does
> not seem to be anywhere close. This could be a problem with crash,
> or with the way PVM works, I am not sure.
>
> Anyway, from what I could take from that situation, it seemed that
> cpu#1 (that one had soft lockup warnings in the log) seemed to try
> to do a wake_up on the task that just seemed to have done an io
> schedule on cpu#7 (but the waitqueue_head spinlock is currently
> locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle
> migration (though the lock currently is free). So I guessed that
> maybe cpu#0 was just woken for the free lock but has not grabbed
> it yet.
>
> From there I wondered whether something that acquires a spinlock
> usually more than the quick path timeout (and this causes new
> lockers to go into the slow path), could cause a stall on a high
> numbered cpu when the lock is contented (as the search and
> signalling is only done for the first waiter starting from 0).
> That lead to below patch. The good thing about it, it does not
> break things immediately, the bad thing, it does not help with
> the problem. :/
>
> The interesting thing when looking at a second dump, taken with
> a testkernel using the patch below, was that again 2 cpus seemed
> to spin slow on a lock that was free in the dump. And again at
> least one did not seem to be doing anything spinlock related
> (anymore?).
> One other detail (but that could be just incidence as well) was
> that in unmodified kernel I usually would end up with soft
> lockup messages, with the patched kernel, that was a stall
> detected by rcu_bh (0 and 1 stalled, detected by 3).
>
> Now I am a bit confused and wonder about some basic things:
> 1. When a cpu goes into the slow lock path and into the poll_irq,
>    shouldn't I expect this one to be in hypercall_page in the
>    dump?
> 2. When does the whole handling for interrupted spinlock wait
>    apply? I assume only for a spinlock taken without irqs
>    disabled and then trying to acquire another one in an
>    interrupt handler. Is that correct?
> 3. Not really related but I just stumbled over it:
>    In xen_spin_trylock: "asm("xchgb %b0,%1"...)
>    What does the "b" part of %b0 do? I thought xchgb already
>    would say it is a byte value...
>
> But putting aside those questions, the fact that two times
> there was two cpus waiting on the same lock which from the
> lock count (=0) was free seems a bit too much of a coincidence.
> Oh and the spinners count in the lock was 2 as one would
> expect.
>
> struct rq {
>   lock = {
>     raw_lock = {
>       {
>         head_tail = 512, 
>         tickets = {
>           head = 0 '\000', 
>           tail = 2 '\002'
>         }
>       }
>     }
>   }, 
>   nr_running = 1,
>   ...
> }
>
> I really don't know how this happens. Especially cpu#0 seems at
> least past the wakeup and should have removed itself from the
> spinners list...
>  
> -Stefan

We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have
never managed to get to the bottom of (because of inability to reliably
reproduce), which might be related.

In our case, certain processes were locking up, and it turned out that
the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
your launchpad ticket) on its own spinlock IPI event channel
(understandable, as its a spinlock), but with the event channel masked,
so it would never wake up from the poll.

Can you reliably reproduce the issue? and if so, would it be possible to
capture hypervisor debug keys q and e when the badness happens?  This
would clearly show if our bugs are related or not

Our repro case took about a month to reproduce, and unfortunately
turning lockdep on has appeared to fix the problem.

~Andrew

>
> [1] http://bugs.launchpad.net/bugs/1011792
>
> From 635a4e101ccbc9a324c8000f7e264ed4e646592c Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Tue, 16 Oct 2012 17:46:09 +0200
> Subject: [PATCH] xen/spinlocks: Make wakeup fairer
>
> Instead of sending the IPI signalling the free lock to the first
> online CPU found waiting for it, start the search from the CPU
> that had the lock last.
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  arch/x86/xen/spinlock.c |   22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index d69cc6c..8b86efb 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -320,17 +320,23 @@ static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
>  static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>  {
>  	int cpu;
> +	int this_cpu = smp_processor_id();
>  
>  	ADD_STATS(released_slow, 1);
>  
> -	for_each_online_cpu(cpu) {
> -		/* XXX should mix up next cpu selection */
> -		if (per_cpu(lock_spinners, cpu) == xl) {
> -			ADD_STATS(released_slow_kicked, 1);
> -			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> -			break;
> -		}
> -	}
> +	cpu = cpumask_next(this_cpu, cpu_online_mask);
> +	do {
> +		if (cpu < nr_cpu_ids) {
> +			if (per_cpu(lock_spinners, cpu) == xl) {
> +				ADD_STATS(released_slow_kicked, 1);
> +				xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> +				break;
> +			}
> +		} else
> +			cpu = -1;
> +
> +		cpu = cpumask_next(cpu, cpu_online_mask);
> +	} while (cpu != this_cpu);
>  }
>  
>  static void xen_spin_unlock(struct arch_spinlock *lock)

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 13:28 ` Andrew Cooper
@ 2012-10-17 13:45   ` Stefan Bader
  2012-10-17 13:55   ` Ian Campbell
  2012-10-17 14:51   ` Jan Beulich
  2 siblings, 0 replies; 29+ messages in thread
From: Stefan Bader @ 2012-10-17 13:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5997 bytes --]

On 17.10.2012 15:28, Andrew Cooper wrote:
> On 17/10/12 14:10, Stefan Bader wrote:
>> I am currently looking at a bug report[1] which is happening when
>> a Xen PVM guest with multiple VCPUs is running a high IO database
>> load (a test script is available in the bug report).
>>
>> In experimenting it seems that this happens (or is getting more
>> likely) when the number of VCPUs is 8 or higher (though I have
>> not tried 6, only 2 and 4), having autogroup enabled seems to
>> make it more likely, too (at some point thought it would actually
>> prevent it but we were wrong) and pv spinlocks enabled.
>> It has happened with older (3.4.3) and newer (4.1.2) versions of
>> Xen as a host and with 3.2 and 3.5 kernels as guests.
>>
>> The pv spinlock assumption I will try to get re-verified by asking
>> to reproduce under a real load with a kernel that just disables
>> that. However, the dumps I am looking at really do look odd there.
>>
>> The first dump I looked at had the spinlock of runqueue[0] being
>> placed into the per-cpu lock_spinners variable for cpu#0 and
>> cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were
>> waiting on the slow path for it to become free. Though actually
>> it was free! Now, here is one issue I have in understanding the
>> dump: the back traces produced in crash are in the normal form
>> not showing any cpu in the poll_irq HV call. Only when using
>> the form that uses the last known stack location and displays
>> all text symols found will get close for cpu#7. cpu#0 still does
>> not seem to be anywhere close. This could be a problem with crash,
>> or with the way PVM works, I am not sure.
>>
>> Anyway, from what I could take from that situation, it seemed that
>> cpu#1 (that one had soft lockup warnings in the log) seemed to try
>> to do a wake_up on the task that just seemed to have done an io
>> schedule on cpu#7 (but the waitqueue_head spinlock is currently
>> locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle
>> migration (though the lock currently is free). So I guessed that
>> maybe cpu#0 was just woken for the free lock but has not grabbed
>> it yet.
>>
>> From there I wondered whether something that acquires a spinlock
>> usually more than the quick path timeout (and this causes new
>> lockers to go into the slow path), could cause a stall on a high
>> numbered cpu when the lock is contented (as the search and
>> signalling is only done for the first waiter starting from 0).
>> That lead to below patch. The good thing about it, it does not
>> break things immediately, the bad thing, it does not help with
>> the problem. :/
>>
>> The interesting thing when looking at a second dump, taken with
>> a testkernel using the patch below, was that again 2 cpus seemed
>> to spin slow on a lock that was free in the dump. And again at
>> least one did not seem to be doing anything spinlock related
>> (anymore?).
>> One other detail (but that could be just incidence as well) was
>> that in unmodified kernel I usually would end up with soft
>> lockup messages, with the patched kernel, that was a stall
>> detected by rcu_bh (0 and 1 stalled, detected by 3).
>>
>> Now I am a bit confused and wonder about some basic things:
>> 1. When a cpu goes into the slow lock path and into the poll_irq,
>>    shouldn't I expect this one to be in hypercall_page in the
>>    dump?
>> 2. When does the whole handling for interrupted spinlock wait
>>    apply? I assume only for a spinlock taken without irqs
>>    disabled and then trying to acquire another one in an
>>    interrupt handler. Is that correct?
>> 3. Not really related but I just stumbled over it:
>>    In xen_spin_trylock: "asm("xchgb %b0,%1"...)
>>    What does the "b" part of %b0 do? I thought xchgb already
>>    would say it is a byte value...
>>
>> But putting aside those questions, the fact that two times
>> there was two cpus waiting on the same lock which from the
>> lock count (=0) was free seems a bit too much of a coincidence.
>> Oh and the spinners count in the lock was 2 as one would
>> expect.
>>
>> struct rq {
>>   lock = {
>>     raw_lock = {
>>       {
>>         head_tail = 512, 
>>         tickets = {
>>           head = 0 '\000', 
>>           tail = 2 '\002'
>>         }
>>       }
>>     }
>>   }, 
>>   nr_running = 1,
>>   ...
>> }
>>
>> I really don't know how this happens. Especially cpu#0 seems at
>> least past the wakeup and should have removed itself from the
>> spinners list...
>>  
>> -Stefan
> 
> We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have
> never managed to get to the bottom of (because of inability to reliably
> reproduce), which might be related.
> 
> In our case, certain processes were locking up, and it turned out that
> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
> your launchpad ticket) on its own spinlock IPI event channel
> (understandable, as its a spinlock), but with the event channel masked,
> so it would never wake up from the poll.

Unfortunately either crash looking at it or something else is wrong for the PV
guest as looking at HYPERVISOR_shared_info appears completely blank.

> 
> Can you reliably reproduce the issue? and if so, would it be possible to
> capture hypervisor debug keys q and e when the badness happens?  This
> would clearly show if our bugs are related or not

Yes, as mentioned the python script in the bug report does that when meeting the
other conditions. I certainly can do the debug keys on that.

> 
> Our repro case took about a month to reproduce, and unfortunately
> turning lockdep on has appeared to fix the problem.

Yeah, having lockdep enabled was another thing that made it less likely. Though
some reporters had lockups with it, too. Just much less likely.

-Stefan

> 
> ~Andrew
> 
>>
>> [1] http://bugs.launchpad.net/bugs/1011792
>>



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 13:28 ` Andrew Cooper
  2012-10-17 13:45   ` Stefan Bader
@ 2012-10-17 13:55   ` Ian Campbell
  2012-10-17 15:21     ` Stefan Bader
  2012-10-17 14:51   ` Jan Beulich
  2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2012-10-17 13:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Konrad Rzeszutek Wilk, stefan.bader@canonical.com,
	xen-devel@lists.xen.org

On Wed, 2012-10-17 at 14:28 +0100, Andrew Cooper wrote:
> In our case, certain processes were locking up, and it turned out that
> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
> your launchpad ticket) on its own spinlock IPI event channel
> (understandable, as its a spinlock), but with the event channel
> masked, so it would never wake up from the poll. 

I think (but you might want to check) that SCHEDOP_poll works (or is
supposed to work!) regardless of whether the specific evtchn you ask for
is masked or not.

The Linux PV spinlock implementation takes advantage of this because it
never wants to take a real interrupt from the spinlock poller evtchn.

The IRQ handler for the spinlock evtchn in Linux is:
        static irqreturn_t dummy_handler(int irq, void *dev_id)
        {
                BUG();
                return IRQ_HANDLED;
        }
        
and right after we register it:
                disable_irq(irq); /* make sure it's never delivered */

The is no enable -- ignoring bugs of which there have been couple of
instances, but those trigger the BUG() so are pretty obvious.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 13:28 ` Andrew Cooper
  2012-10-17 13:45   ` Stefan Bader
  2012-10-17 13:55   ` Ian Campbell
@ 2012-10-17 14:51   ` Jan Beulich
  2012-10-17 15:12     ` Andrew Cooper
  2 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-17 14:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, stefan.bader, xen-devel

>>> On 17.10.12 at 15:28, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> In our case, certain processes were locking up, and it turned out that
> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
> your launchpad ticket) on its own spinlock IPI event channel
> (understandable, as its a spinlock), but with the event channel masked,
> so it would never wake up from the poll.

Probably some misunderstanding: The event channel used
here will always be masked, and whether an event channel is
masked doesn't matter for the purposes of polling it.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 14:51   ` Jan Beulich
@ 2012-10-17 15:12     ` Andrew Cooper
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-10-17 15:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Konrad Rzeszutek Wilk, stefan.bader@canonical.com,
	xen-devel@lists.xen.org

On 17/10/12 15:51, Jan Beulich wrote:
>>>> On 17.10.12 at 15:28, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> In our case, certain processes were locking up, and it turned out that
>> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
>> your launchpad ticket) on its own spinlock IPI event channel
>> (understandable, as its a spinlock), but with the event channel masked,
>> so it would never wake up from the poll.
> Probably some misunderstanding: The event channel used
> here will always be masked, and whether an event channel is
> masked doesn't matter for the purposes of polling it.
>
> Jan
>

Sorry - I was not very clear.

In our case, we saw every single VCPU stuck in a SCHEDOP_poll on its own
spinlock event channel, which renderes the guest completely deadlocked.

My understanding was that while some VCPUs were blocked in the
hypervisor, the VCPU with the lock should raise an event when the lock
was released, waking the other VCPUs so one could grab the lock.  This
implication of this being that there is a bug in the spinlock code.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 13:55   ` Ian Campbell
@ 2012-10-17 15:21     ` Stefan Bader
  2012-10-17 15:35       ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-17 15:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 7861 bytes --]

On 17.10.2012 15:55, Ian Campbell wrote:
> On Wed, 2012-10-17 at 14:28 +0100, Andrew Cooper wrote:
>> In our case, certain processes were locking up, and it turned out that
>> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
>> your launchpad ticket) on its own spinlock IPI event channel
>> (understandable, as its a spinlock), but with the event channel
>> masked, so it would never wake up from the poll. 
> 
> I think (but you might want to check) that SCHEDOP_poll works (or is
> supposed to work!) regardless of whether the specific evtchn you ask for
> is masked or not.

I was assuming it to be supposed to work at least in Xen 4.1.2. Or at least if
it did not I would hope to catch VCPUs rather sitting on the hypercall than
doing nothing. Of course I cannot say how reliable information in crash is as
this is something new to do after Daniel fixed crash.

> 
> The Linux PV spinlock implementation takes advantage of this because it
> never wants to take a real interrupt from the spinlock poller evtchn.

Right, I probably should have realized that. Though I guess it is still
interesting to see whether the channel is pending.

So when just recreating, I got the q and e info which is (assuming the guest
domain is enough):

(XEN) Event channel information for domain 1:
(XEN) Polling vCPUs: {1,4,6}
(XEN)     port [p/m]
(XEN)        1 [0/0]: s=3 n=0 d=0 p=68 x=0
(XEN)        2 [1/0]: s=3 n=0 d=0 p=69 x=0
(XEN)        3 [1/0]: s=5 n=0 v=0 x=0
(XEN)        4 [1/1]: s=6 n=0 x=0
(XEN)        5 [1/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=6 n=0 x=0
(XEN)        7 [0/0]: s=5 n=0 v=1 x=0
(XEN)        8 [0/0]: s=6 n=0 x=0
(XEN)        9 [1/0]: s=5 n=1 v=0 x=0
(XEN)       10 [0/1]: s=6 n=1 x=0
(XEN)       11 [1/0]: s=6 n=1 x=0
(XEN)       12 [0/0]: s=6 n=1 x=0
(XEN)       13 [0/0]: s=5 n=1 v=1 x=0
(XEN)       14 [0/0]: s=6 n=1 x=0
(XEN)       15 [0/0]: s=5 n=2 v=0 x=0
(XEN)       16 [1/1]: s=6 n=2 x=0
(XEN)       17 [0/0]: s=6 n=2 x=0
(XEN)       18 [0/0]: s=6 n=2 x=0
(XEN)       19 [0/0]: s=5 n=2 v=1 x=0
(XEN)       20 [0/0]: s=6 n=2 x=0
(XEN)       21 [0/0]: s=5 n=3 v=0 x=0
(XEN)       22 [1/1]: s=6 n=3 x=0
(XEN)       23 [0/0]: s=6 n=3 x=0
(XEN)       24 [0/0]: s=6 n=3 x=0
(XEN)       25 [0/0]: s=5 n=3 v=1 x=0
(XEN)       26 [0/0]: s=6 n=3 x=0
(XEN)       27 [1/0]: s=5 n=4 v=0 x=0
(XEN)       28 [0/1]: s=6 n=4 x=0
(XEN)       29 [1/0]: s=6 n=4 x=0
(XEN)       30 [0/0]: s=6 n=4 x=0
(XEN)       31 [0/0]: s=5 n=4 v=1 x=0
(XEN)       32 [0/0]: s=6 n=4 x=0
(XEN)       33 [0/0]: s=5 n=5 v=0 x=0
(XEN)       34 [0/1]: s=6 n=5 x=0
(XEN)       35 [0/0]: s=6 n=5 x=0
(XEN)       36 [0/0]: s=6 n=5 x=0
(XEN)       37 [0/0]: s=5 n=5 v=1 x=0
(XEN)       38 [0/0]: s=6 n=5 x=0
(XEN)       39 [1/0]: s=5 n=6 v=0 x=0
(XEN)       40 [0/1]: s=6 n=6 x=0
(XEN)       41 [1/0]: s=6 n=6 x=0
(XEN)       42 [0/0]: s=6 n=6 x=0
(XEN)       43 [0/0]: s=5 n=6 v=1 x=0
(XEN)       44 [0/0]: s=6 n=6 x=0
(XEN)       45 [0/0]: s=5 n=7 v=0 x=0
(XEN)       46 [1/1]: s=6 n=7 x=0
(XEN)       47 [0/0]: s=6 n=7 x=0
(XEN)       48 [0/0]: s=6 n=7 x=0
(XEN)       49 [0/0]: s=5 n=7 v=1 x=0
(XEN)       50 [0/0]: s=6 n=7 x=0
(XEN)       51 [0/0]: s=3 n=7 d=0 p=70 x=0
(XEN)       52 [0/0]: s=3 n=0 d=0 p=71 x=0
(XEN)       53 [0/0]: s=3 n=0 d=0 p=72 x=0
(XEN)       54 [0/0]: s=3 n=0 d=0 p=73 x=0
(XEN)       55 [1/0]: s=3 n=0 d=0 p=74 x=0

[maybe someone can tell me what the s,n,d,p and x mean]

(XEN) Rangesets belonging to domain 1:
(XEN)     I/O Ports  { }
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN) Memory pages belonging to domain 1:
(XEN)     DomPage list too long to display
(XEN)     XenPage 00000000008000ab: caf=c000000000000002, taf=7400000000000002
(XEN)     XenPage 00000000008000aa: caf=c000000000000002, taf=7400000000000002
(XEN)     XenPage 00000000008000a9: caf=c000000000000002, taf=7400000000000002
(XEN)     XenPage 00000000008000a8: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000000dfae4: caf=c000000000000002, taf=7400000000000002
(XEN) VCPU information and callbacks for domain 1:
(XEN)     VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask = 01
dirty_cpus={3} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01, upcall_mask = 01
dirty_cpus={} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU2: CPU4 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU3: CPU5 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01, upcall_mask = 01
dirty_cpus={} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU5: CPU7 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00
dirty_cpus={7} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01, upcall_mask = 01
dirty_cpus={} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU7: CPU6 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask = 01
dirty_cpus={6} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN) Notifying guest 0:0 (virq 1, port 5, stat 0/0/0)
(XEN) Notifying guest 0:1 (virq 1, port 11, stat 0/0/0)
(XEN) Notifying guest 0:2 (virq 1, port 17, stat 0/0/0)
(XEN) Notifying guest 0:3 (virq 1, port 23, stat 0/0/0)
(XEN) Notifying guest 0:4 (virq 1, port 29, stat 0/0/0)
(XEN) Notifying guest 0:5 (virq 1, port 35, stat 0/0/0)
(XEN) Notifying guest 0:6 (virq 1, port 41, stat 0/0/0)
(XEN) Notifying guest 0:7 (virq 1, port 47, stat 0/0/0)
(XEN) Notifying guest 1:0 (virq 1, port 7, stat 0/0/-1)
(XEN) Notifying guest 1:1 (virq 1, port 13, stat 0/0/-1)
(XEN) Notifying guest 1:2 (virq 1, port 19, stat 0/0/0)
(XEN) Notifying guest 1:3 (virq 1, port 25, stat 0/0/0)
(XEN) Notifying guest 1:4 (virq 1, port 31, stat 0/0/-1)
(XEN) Notifying guest 1:5 (virq 1, port 37, stat 0/0/0)
(XEN) Notifying guest 1:6 (virq 1, port 43, stat 0/0/-1)
(XEN) Notifying guest 1:7 (virq 1, port 49, stat 0/0/0)

When the guest was unresponsive the console would still show:
[10174.372092] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 1} (detected
by 4, t=15002 jiffies)
[10284.448089] INFO: rcu_bh detected stalls on CPUs/tasks: { 0 1 4 6} (detected
by 5, t=15004 jiffies)

in a repeating pattern. So I take the above as cpus 1,4 and 6 are polling. From
the dump and the content of lock_spinners I get:

cpu 0 and 1 -> ffff8803bfc13700 (which is runqueue[0] and is unlocked again)
cpu 4 and 6 -> ffffffff81f15ef0 (which is blkif_io_lock and is locked)

Backtraces would be somewhat inconsistent (as always). Note, I should mention
that I still had a kernel with my patch applied on that guest. That changes
things a bit (actually it takes a bit longer to hang but again that might be
just a matter of timing). The strange lock state of 2 spinners on an unlocked
lock remains the same with or without it.

One question about the patch actually, would anybody think that there could be a
case where the unlocking cpu has itself on the spinners list? I did not think so
but that might be wrong.
>  
> The IRQ handler for the spinlock evtchn in Linux is:
>         static irqreturn_t dummy_handler(int irq, void *dev_id)
>         {
>                 BUG();
>                 return IRQ_HANDLED;
>         }
>         
> and right after we register it:
>                 disable_irq(irq); /* make sure it's never delivered */
> 
> The is no enable -- ignoring bugs of which there have been couple of
> instances, but those trigger the BUG() so are pretty obvious.
> 
> Ian.
> 
> 





[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 15:21     ` Stefan Bader
@ 2012-10-17 15:35       ` Andrew Cooper
  2012-10-17 16:27         ` Stefan Bader
  2012-10-18  7:00         ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-10-17 15:35 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 8551 bytes --]


On 17/10/12 16:21, Stefan Bader wrote:
> On 17.10.2012 15:55, Ian Campbell wrote:
>> On Wed, 2012-10-17 at 14:28 +0100, Andrew Cooper wrote:
>>> In our case, certain processes were locking up, and it turned out that
>>> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
>>> your launchpad ticket) on its own spinlock IPI event channel
>>> (understandable, as its a spinlock), but with the event channel
>>> masked, so it would never wake up from the poll.
>>
>> I think (but you might want to check) that SCHEDOP_poll works (or is
>> supposed to work!) regardless of whether the specific evtchn you ask for
>> is masked or not.
>
> I was assuming it to be supposed to work at least in Xen 4.1.2. Or at
least if
> it did not I would hope to catch VCPUs rather sitting on the hypercall
than
> doing nothing. Of course I cannot say how reliable information in
crash is as
> this is something new to do after Daniel fixed crash.
>
>>
>> The Linux PV spinlock implementation takes advantage of this because it
>> never wants to take a real interrupt from the spinlock poller evtchn.
>
> Right, I probably should have realized that. Though I guess it is still
> interesting to see whether the channel is pending.
>
> So when just recreating, I got the q and e info which is (assuming the
guest
> domain is enough):
>
> (XEN) Event channel information for domain 1:
> (XEN) Polling vCPUs: {1,4,6}
> (XEN) port [p/m]
> (XEN) 1 [0/0]: s=3 n=0 d=0 p=68 x=0
> (XEN) 2 [1/0]: s=3 n=0 d=0 p=69 x=0
> (XEN) 3 [1/0]: s=5 n=0 v=0 x=0
> (XEN) 4 [1/1]: s=6 n=0 x=0
> (XEN) 5 [1/0]: s=6 n=0 x=0
> (XEN) 6 [0/0]: s=6 n=0 x=0
> (XEN) 7 [0/0]: s=5 n=0 v=1 x=0
> (XEN) 8 [0/0]: s=6 n=0 x=0
> (XEN) 9 [1/0]: s=5 n=1 v=0 x=0
> (XEN) 10 [0/1]: s=6 n=1 x=0
> (XEN) 11 [1/0]: s=6 n=1 x=0
> (XEN) 12 [0/0]: s=6 n=1 x=0
> (XEN) 13 [0/0]: s=5 n=1 v=1 x=0
> (XEN) 14 [0/0]: s=6 n=1 x=0
> (XEN) 15 [0/0]: s=5 n=2 v=0 x=0
> (XEN) 16 [1/1]: s=6 n=2 x=0
> (XEN) 17 [0/0]: s=6 n=2 x=0
> (XEN) 18 [0/0]: s=6 n=2 x=0
> (XEN) 19 [0/0]: s=5 n=2 v=1 x=0
> (XEN) 20 [0/0]: s=6 n=2 x=0
> (XEN) 21 [0/0]: s=5 n=3 v=0 x=0
> (XEN) 22 [1/1]: s=6 n=3 x=0
> (XEN) 23 [0/0]: s=6 n=3 x=0
> (XEN) 24 [0/0]: s=6 n=3 x=0
> (XEN) 25 [0/0]: s=5 n=3 v=1 x=0
> (XEN) 26 [0/0]: s=6 n=3 x=0
> (XEN) 27 [1/0]: s=5 n=4 v=0 x=0
> (XEN) 28 [0/1]: s=6 n=4 x=0
> (XEN) 29 [1/0]: s=6 n=4 x=0
> (XEN) 30 [0/0]: s=6 n=4 x=0
> (XEN) 31 [0/0]: s=5 n=4 v=1 x=0
> (XEN) 32 [0/0]: s=6 n=4 x=0
> (XEN) 33 [0/0]: s=5 n=5 v=0 x=0
> (XEN) 34 [0/1]: s=6 n=5 x=0
> (XEN) 35 [0/0]: s=6 n=5 x=0
> (XEN) 36 [0/0]: s=6 n=5 x=0
> (XEN) 37 [0/0]: s=5 n=5 v=1 x=0
> (XEN) 38 [0/0]: s=6 n=5 x=0
> (XEN) 39 [1/0]: s=5 n=6 v=0 x=0
> (XEN) 40 [0/1]: s=6 n=6 x=0
> (XEN) 41 [1/0]: s=6 n=6 x=0
> (XEN) 42 [0/0]: s=6 n=6 x=0
> (XEN) 43 [0/0]: s=5 n=6 v=1 x=0
> (XEN) 44 [0/0]: s=6 n=6 x=0event channel
> (XEN) 45 [0/0]: s=5 n=7 v=0 x=0
> (XEN) 46 [1/1]: s=6 n=7 x=0
> (XEN) 47 [0/0]: s=6 n=7 x=0
> (XEN) 48 [0/0]: s=6 n=7 x=0
> (XEN) 49 [0/0]: s=5 n=7 v=1 x=0
> (XEN) 50 [0/0]: s=6 n=7 x=0
> (XEN) 51 [0/0]: s=3 n=7 d=0 p=70 x=0
> (XEN) 52 [0/0]: s=3 n=0 d=0 p=71 x=0
> (XEN) 53 [0/0]: s=3 n=0 d=0 p=72 x=0
> (XEN) 54 [0/0]: s=3 n=0 d=0 p=73 x=0
> (XEN) 55 [1/0]: s=3 n=0 d=0 p=74 x=0
>
> [maybe someone can tell me what the s,n,d,p and x mean]

s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
pirq, 5 = virq, 6 = ipi
n = target vcpu id to notify
x = boolean indicating whether xen is a consumer of the event channel or
not.

d = target domain (when appropriate)  In this case, p is the target port.

>
>
> (XEN) Rangesets belonging to domain 1:
> (XEN) I/O Ports { }
> (XEN) Interrupts { }
> (XEN) I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN) DomPage list too long to display
> (XEN) XenPage 00000000008000ab: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000aa: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a9: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a8: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 00000000000dfae4: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 1:
> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
= 01
> dirty_cpus={3} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU2: CPU4 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU3: CPU5 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU5: CPU7 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={7} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU7: CPU6 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 01
> dirty_cpus={6} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 5, stat 0/0/0)
> (XEN) Notifying guest 0:1 (virq 1, port 11, stat 0/0/0)
> (XEN) Notifying guest 0:2 (virq 1, port 17, stat 0/0/0)
> (XEN) Notifying guest 0:3 (virq 1, port 23, stat 0/0/0)
> (XEN) Notifying guest 0:4 (virq 1, port 29, stat 0/0/0)
> (XEN) Notifying guest 0:5 (virq 1, port 35, stat 0/0/0)
> (XEN) Notifying guest 0:6 (virq 1, port 41, stat 0/0/0)
> (XEN) Notifying guest 0:7 (virq 1, port 47, stat 0/0/0)
> (XEN) Notifying guest 1:0 (virq 1, port 7, stat 0/0/-1)
> (XEN) Notifying guest 1:1 (virq 1, port 13, stat 0/0/-1)
> (XEN) Notifying guest 1:2 (virq 1, port 19, stat 0/0/0)
> (XEN) Notifying guest 1:3 (virq 1, port 25, stat 0/0/0)
> (XEN) Notifying guest 1:4 (virq 1, port 31, stat 0/0/-1)
> (XEN) Notifying guest 1:5 (virq 1, port 37, stat 0/0/0)
> (XEN) Notifying guest 1:6 (virq 1, port 43, stat 0/0/-1)
> (XEN) Notifying guest 1:7 (virq 1, port 49, stat 0/0/0)

So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
channel for itself.

Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.

In each case, the event channels are masked (no surprise given the
conversation so far on this thread), and have no pending events. 
Therefore, I believe we are looking at the same bug.


>
>
> When the guest was unresponsive the console would still show:
> [10174.372092] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 1}
(detected
> by 4, t=15002 jiffies)
> [10284.448089] INFO: rcu_bh detected stalls on CPUs/tasks: { 0 1 4 6}
(detected
> by 5, t=15004 jiffies)
>
> in a repeating pattern. So I take the above as cpus 1,4 and 6 are
polling. From
> the dump and the content of lock_spinners I get:
>
> cpu 0 and 1 -> ffff8803bfc13700 (which is runqueue[0] and is unlocked
again)
> cpu 4 and 6 -> ffffffff81f15ef0 (which is blkif_io_lock and is locked)

I wonder if there is possibly a race condition between notifying that a
lock has been unlocked, and another vcpu trying to poll after deciding
that the lock is locked.

The other option is that there is a bug in working out which event
channel to notify when a lock is unlocked.

~Andrew

>
>
> Backtraces would be somewhat inconsistent (as always). Note, I should
mention
> that I still had a kernel with my patch applied on that guest. That
changes
> things a bit (actually it takes a bit longer to hang but again that
might be
> just a matter of timing). The strange lock state of 2 spinners on an
unlocked
> lock remains the same with or without it.
>
> One question about the patch actually, would anybody think that there
could be a
> case where the unlocking cpu has itself on the spinners list? I did
not think so
> but that might be wrong.
>>
>> The IRQ handler for the spinlock evtchn in Linux is:
>> static irqreturn_t dummy_handler(int irq, void *dev_id)
>> {
>> BUG();
>> return IRQ_HANDLED;
>> }
>>
>> and right after we register it:
>> disable_irq(irq); /* make sure it's never delivered */
>>
>> The is no enable -- ignoring bugs of which there have been couple of
>> instances, but those trigger the BUG() so are pretty obvious.
>>
>> Ian.
>>
>>
>
>
>
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #1.2: Type: text/html, Size: 11966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 15:35       ` Andrew Cooper
@ 2012-10-17 16:27         ` Stefan Bader
  2012-10-17 17:46           ` Andrew Cooper
  2012-10-18  7:00         ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-17 16:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel@lists.xen.org, Ian Campbell, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 3945 bytes --]

On 17.10.2012 17:35, Andrew Cooper wrote:

>> (XEN) Event channel information for domain 1:
>> (XEN) Polling vCPUs: {1,4,6}
>> (XEN) port [p/m]
>> (XEN) 4 [1/1]: s=6 n=0 x=0
>> (XEN) 10 [0/1]: s=6 n=1 x=0
>> (XEN) 28 [0/1]: s=6 n=4 x=0
>> (XEN) 40 [0/1]: s=6 n=6 x=0
>>
> s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
> pirq, 5 = virq, 6 = ipi
> n = target vcpu id to notify
> x = boolean indicating whether xen is a consumer of the event channel or
> not.
> 
> d = target domain (when appropriate)  In this case, p is the target port.
> 

Thanks (at least something learned today :)) One thing I noticed here, in the
event channel info above, pending is 0 for channel 10, 28 and 40 (and set for 4
which is the spinlock ipi for cpu 0). But in the VCPU info below (another
unknown: has=T and F) it says upcall_pend for all of them. Unfortunately that
might just mean that things change...

>> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
> = 01
>> dirty_cpus={3} cpu_affinity={0-127}
>> (XEN) No periodic timer
>> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer

>> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer
>> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer
> 
> So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
> channel for itself.
> 
> Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.
> 

> 
> I wonder if there is possibly a race condition between notifying that a
> lock has been unlocked, and another vcpu trying to poll after deciding
> that the lock is locked.

There has to be something somehwere, I just cannot spot it. The unlocking cpu
will do a wmb() before setting the lock to 0, then a mb() and then check for
spinners. When failing the quick pack a locker will first set the lockspinner
entry, then do a wmb() and increment the spinners count. After that it clears
the event pending and then checks lock again before actually going into poll.

> 
> The other option is that there is a bug in working out which event
> channel to notify when a lock is unlocked.

I had thought I saw one thing that I tried to fix with my patch. Another train
of thought would have been any other cpu grabbing the lock always as soon as it
gets released and so preventing any cpu in poll from success. But that would
then show the lock as locked...

> 
> ~Andrew
> 
>>
>>
>> Backtraces would be somewhat inconsistent (as always). Note, I should
> mention
>> that I still had a kernel with my patch applied on that guest. That
> changes
>> things a bit (actually it takes a bit longer to hang but again that
> might be
>> just a matter of timing). The strange lock state of 2 spinners on an
> unlocked
>> lock remains the same with or without it.
>>
>> One question about the patch actually, would anybody think that there
> could be a
>> case where the unlocking cpu has itself on the spinners list? I did
> not think so
>> but that might be wrong.
>>>
>>> The IRQ handler for the spinlock evtchn in Linux is:
>>> static irqreturn_t dummy_handler(int irq, void *dev_id)
>>> {
>>> BUG();
>>> return IRQ_HANDLED;
>>> }
>>>
>>> and right after we register it:
>>> disable_irq(irq); /* make sure it's never delivered */
>>>
>>> The is no enable -- ignoring bugs of which there have been couple of
>>> instances, but those trigger the BUG() so are pretty obvious.
>>>
>>> Ian.
>>>
>>>
>>
>>
>>
>>
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 16:27         ` Stefan Bader
@ 2012-10-17 17:46           ` Andrew Cooper
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2012-10-17 17:46 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel@lists.xen.org, Ian Campbell, Konrad Rzeszutek Wilk


On 17/10/12 17:27, Stefan Bader wrote:
> On 17.10.2012 17:35, Andrew Cooper wrote:
>
>>> (XEN) Event channel information for domain 1:
>>> (XEN) Polling vCPUs: {1,4,6}
>>> (XEN) port [p/m]
>>> (XEN) 4 [1/1]: s=6 n=0 x=0
>>> (XEN) 10 [0/1]: s=6 n=1 x=0
>>> (XEN) 28 [0/1]: s=6 n=4 x=0
>>> (XEN) 40 [0/1]: s=6 n=6 x=0
>>>
>> s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
>> pirq, 5 = virq, 6 = ipi
>> n = target vcpu id to notify
>> x = boolean indicating whether xen is a consumer of the event channel or
>> not.
>>
>> d = target domain (when appropriate)  In this case, p is the target port.
>>
> Thanks (at least something learned today :)) One thing I noticed here, in the
> event channel info above, pending is 0 for channel 10, 28 and 40 (and set for 4
> which is the spinlock ipi for cpu 0). But in the VCPU info below (another
> unknown: has=T and F) it says upcall_pend for all of them. Unfortunately that
> might just mean that things change...

I believe the upcall_pending is just a simple bit indicating that an
event channel for that vcpu has an event pending, but doesn't identify
which event channel it is.

Of course, as the VCPU is currently descheduled in Xen waiting for a
specific event channel to end the poll, other events are likely to build
up, but will not be serviced until the VCPU resumes.

The "has=" value is a boolean from struct vcpu,  vcpu->is_running.  This
is a bit more subtle, as a vcpu is still considered running if its
context is on the hypervisor pcpu stack, but Xen has invoked the idle
loop on that PCPU.  I don't know whether this is relevant to the bug.

>
>>> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
>> = 01
>>> dirty_cpus={3} cpu_affinity={0-127}
>>> (XEN) No periodic timer
>>> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
>> upcall_mask = 01
>>> dirty_cpus={} cpu_affinity={0-127}
>>> (XEN) No periodic timer
>>> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
>> upcall_mask = 01
>>> dirty_cpus={} cpu_affinity={0-127}
>>> (XEN) No periodic timer
>>> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
>> upcall_mask = 01
>>> dirty_cpus={} cpu_affinity={0-127}
>>> (XEN) No periodic timer
>> So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
>> channel for itself.
>>
>> Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.
>>
>> I wonder if there is possibly a race condition between notifying that a
>> lock has been unlocked, and another vcpu trying to poll after deciding
>> that the lock is locked.
> There has to be something somehwere, I just cannot spot it. The unlocking cpu
> will do a wmb() before setting the lock to 0, then a mb() and then check for
> spinners. When failing the quick pack a locker will first set the lockspinner
> entry, then do a wmb() and increment the spinners count. After that it clears
> the event pending and then checks lock again before actually going into poll.
>
>> The other option is that there is a bug in working out which event
>> channel to notify when a lock is unlocked.
> I had thought I saw one thing that I tried to fix with my patch. Another train
> of thought would have been any other cpu grabbing the lock always as soon as it
> gets released and so preventing any cpu in poll from success. But that would
> then show the lock as locked...

How does the unlocking code choose an event channel to invoke?  Is there
perhaps a race condition updating that value if another cpu nabs the
lock inbetween?

(I am just throwing ideas around - I am not very familar with this code)

~Andrew

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-17 15:35       ` Andrew Cooper
  2012-10-17 16:27         ` Stefan Bader
@ 2012-10-18  7:00         ` Jan Beulich
  2012-10-18  7:08           ` Jan Beulich
  2012-10-18  7:24           ` Stefan Bader
  1 sibling, 2 replies; 29+ messages in thread
From: Jan Beulich @ 2012-10-18  7:00 UTC (permalink / raw)
  To: Stefan Bader, Andrew Cooper
  Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel@lists.xen.org

>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> In each case, the event channels are masked (no surprise given the
> conversation so far on this thread), and have no pending events. 
> Therefore, I believe we are looking at the same bug.

That seems very unlikely (albeit not impossible) to me, given that
the non-pvops kernel uses ticket locks while the pvops one doesn't.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18  7:00         ` Jan Beulich
@ 2012-10-18  7:08           ` Jan Beulich
  2012-10-18  7:38             ` Stefan Bader
  2012-10-18  7:24           ` Stefan Bader
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-18  7:08 UTC (permalink / raw)
  To: Stefan Bader, andrew.cooper3
  Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel@lists.xen.org

>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> In each case, the event channels are masked (no surprise given the
>> conversation so far on this thread), and have no pending events. 
>> Therefore, I believe we are looking at the same bug.
> 
> That seems very unlikely (albeit not impossible) to me, given that
> the non-pvops kernel uses ticket locks while the pvops one doesn't.

And in fact we had a similar problem with our original ticket lock
implementation, exposed by an open coded lock in the scheduler's
run queue management. But that was really ticket lock specific,
in that the fact that a CPU could passively become the owner of
a lock while polling - that's impossible with pvops' byte locks afaict.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18  7:00         ` Jan Beulich
  2012-10-18  7:08           ` Jan Beulich
@ 2012-10-18  7:24           ` Stefan Bader
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Bader @ 2012-10-18  7:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel@lists.xen.org, Ian Campbell,
	Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 886 bytes --]

On 18.10.2012 09:00, Jan Beulich wrote:
>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> In each case, the event channels are masked (no surprise given the
>> conversation so far on this thread), and have no pending events. 
>> Therefore, I believe we are looking at the same bug.
> 
> That seems very unlikely (albeit not impossible) to me, given that
> the non-pvops kernel uses ticket locks while the pvops one doesn't.

True, if classic 2.6.32 actually meant non-pvops. In my memory that was the
version that first introduced them, so I assumed we talk about the same base.

The one I am looking at seems (as much info there is available right now, but I
try to get confirmation from reporters running a non-paravirt-spinlock kernel on
a production load) to be related directly to the paravirtualized spinlock
implementation for Xen.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18  7:08           ` Jan Beulich
@ 2012-10-18  7:38             ` Stefan Bader
  2012-10-18  7:48               ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-18  7:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, xen-devel@lists.xen.org, Ian Campbell,
	Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 1430 bytes --]

On 18.10.2012 09:08, Jan Beulich wrote:
>>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> In each case, the event channels are masked (no surprise given the
>>> conversation so far on this thread), and have no pending events. 
>>> Therefore, I believe we are looking at the same bug.
>>
>> That seems very unlikely (albeit not impossible) to me, given that
>> the non-pvops kernel uses ticket locks while the pvops one doesn't.
> 
> And in fact we had a similar problem with our original ticket lock
> implementation, exposed by an open coded lock in the scheduler's
> run queue management. But that was really ticket lock specific,
> in that the fact that a CPU could passively become the owner of
> a lock while polling - that's impossible with pvops' byte locks afaict.

One of the trains of thought I had was whether it could happen that a cpu is in
polling and the task gets moved. But I don't think it can happen as the
hypercall unlikely is a place where any schedule happens (preempt is none). And
it would be much more common...

One detail which I hope someone can fill in is the whole "interrupted spinlock"
thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
on. Is that really only for spinlocks taken without interrupts disabled or do I
miss something there?




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18  7:38             ` Stefan Bader
@ 2012-10-18  7:48               ` Ian Campbell
  2012-10-18 10:20                 ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2012-10-18  7:48 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, xen-devel@lists.xen.org, Jan Beulich,
	Konrad Rzeszutek Wilk

On Thu, 2012-10-18 at 08:38 +0100, Stefan Bader wrote:
> On 18.10.2012 09:08, Jan Beulich wrote:
> >>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
> >>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>> In each case, the event channels are masked (no surprise given the
> >>> conversation so far on this thread), and have no pending events. 
> >>> Therefore, I believe we are looking at the same bug.
> >>
> >> That seems very unlikely (albeit not impossible) to me, given that
> >> the non-pvops kernel uses ticket locks while the pvops one doesn't.
> > 
> > And in fact we had a similar problem with our original ticket lock
> > implementation, exposed by an open coded lock in the scheduler's
> > run queue management. But that was really ticket lock specific,
> > in that the fact that a CPU could passively become the owner of
> > a lock while polling - that's impossible with pvops' byte locks afaict.
> 
> One of the trains of thought I had was whether it could happen that a cpu is in
> polling and the task gets moved. But I don't think it can happen as the
> hypercall unlikely is a place where any schedule happens (preempt is none). And
> it would be much more common...
> 
> One detail which I hope someone can fill in is the whole "interrupted spinlock"
> thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
> on. Is that really only for spinlocks taken without interrupts disabled or do I
> miss something there?

spinning_lock() returns the old lock which the caller is expected to
remember and replace via unspinning_lock() -- it effectively implements
a stack of locks which are being waited on. xen_spin_lock_slow (the only
caller0 appears to do this correctly from a brief inspection.

Is there any chance this is just a simple AB-BA or similar type
deadlock? Do we have data which suggests all vCPUs are waiting on the
same lock or just that they are waiting on some lock? I suppose lockdep
(which I think you mentioned before?) would have caught this, unless pv
locks somehow confound it?

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18  7:48               ` Ian Campbell
@ 2012-10-18 10:20                 ` Stefan Bader
  2012-10-18 10:47                   ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-18 10:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Jan Beulich,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 3423 bytes --]

On 18.10.2012 09:48, Ian Campbell wrote:
> On Thu, 2012-10-18 at 08:38 +0100, Stefan Bader wrote:
>> On 18.10.2012 09:08, Jan Beulich wrote:
>>>>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> In each case, the event channels are masked (no surprise given the
>>>>> conversation so far on this thread), and have no pending events. 
>>>>> Therefore, I believe we are looking at the same bug.
>>>>
>>>> That seems very unlikely (albeit not impossible) to me, given that
>>>> the non-pvops kernel uses ticket locks while the pvops one doesn't.
>>>
>>> And in fact we had a similar problem with our original ticket lock
>>> implementation, exposed by an open coded lock in the scheduler's
>>> run queue management. But that was really ticket lock specific,
>>> in that the fact that a CPU could passively become the owner of
>>> a lock while polling - that's impossible with pvops' byte locks afaict.
>>
>> One of the trains of thought I had was whether it could happen that a cpu is in
>> polling and the task gets moved. But I don't think it can happen as the
>> hypercall unlikely is a place where any schedule happens (preempt is none). And
>> it would be much more common...
>>
>> One detail which I hope someone can fill in is the whole "interrupted spinlock"
>> thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
>> on. Is that really only for spinlocks taken without interrupts disabled or do I
>> miss something there?
> 
> spinning_lock() returns the old lock which the caller is expected to
> remember and replace via unspinning_lock() -- it effectively implements
> a stack of locks which are being waited on. xen_spin_lock_slow (the only
> caller0 appears to do this correctly from a brief inspection.

Yes, just *when* can there be a stack of locks (spinlocks). The poll_irq
hypercall seems to be an active (in the sense of not preemting to another task)
process. How could there be a situation that another lock (on the same cpu is
tried to be taken).
> 
> Is there any chance this is just a simple AB-BA or similar type
> deadlock? Do we have data which suggests all vCPUs are waiting on the
> same lock or just that they are waiting on some lock? I suppose lockdep
> (which I think you mentioned before?) would have caught this, unless pv
> locks somehow confound it?

The one situation where I went deeper into the tasks that appeared to be on a
cpu it was one waiting for signalling a task that looked to be just scheduled
out and the cpu it was running on doing a idle balance that waited on the lock
for cpu#0's runqueue. Which cpu#0 itself seemed to be waiting slow (the lock
pointer was on lock_spinners[0]) but the lock itself was 0.
Though there is a chance that this is always just a coincidental state where the
lock just was released and more related to how the Xen stack does a guest dump.
So it would be to find who holds the other lock.
Unfortunately at least a full lock debugging enabled kernel is sufficiently
different in timing that I cannot reproduce the issue on a test machine. And
from reported crashes in production I have no data.

> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18 10:20                 ` Stefan Bader
@ 2012-10-18 10:47                   ` Jan Beulich
  2012-10-18 12:43                     ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-18 10:47 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 18.10.12 at 12:20, Stefan Bader <stefan.bader@canonical.com> wrote:
> On 18.10.2012 09:48, Ian Campbell wrote:
>> spinning_lock() returns the old lock which the caller is expected to
>> remember and replace via unspinning_lock() -- it effectively implements
>> a stack of locks which are being waited on. xen_spin_lock_slow (the only
>> caller0 appears to do this correctly from a brief inspection.
> 
> Yes, just *when* can there be a stack of locks (spinlocks). The poll_irq
> hypercall seems to be an active (in the sense of not preemting to another 
> task)
> process. How could there be a situation that another lock (on the same cpu 
> is tried to be taken).

Obviously when this is an acquire not disabling interrupts, and
an interrupt comes in while in the poll hypercall (or about to go
there, or just having come back from one).

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18 10:47                   ` Jan Beulich
@ 2012-10-18 12:43                     ` Stefan Bader
  2012-10-18 20:52                       ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-18 12:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1347 bytes --]

On 18.10.2012 12:47, Jan Beulich wrote:
>>>> On 18.10.12 at 12:20, Stefan Bader <stefan.bader@canonical.com> wrote:
>> On 18.10.2012 09:48, Ian Campbell wrote:
>>> spinning_lock() returns the old lock which the caller is expected to
>>> remember and replace via unspinning_lock() -- it effectively implements
>>> a stack of locks which are being waited on. xen_spin_lock_slow (the only
>>> caller0 appears to do this correctly from a brief inspection.
>>
>> Yes, just *when* can there be a stack of locks (spinlocks). The poll_irq
>> hypercall seems to be an active (in the sense of not preemting to another 
>> task)
>> process. How could there be a situation that another lock (on the same cpu 
>> is tried to be taken).
> 
> Obviously when this is an acquire not disabling interrupts, and
> an interrupt comes in while in the poll hypercall (or about to go
> there, or just having come back from one).
> 
> Jan
> 
Obviously. ;) Ok, so my thinking there was ok and its one level deep max. At
some point staring at things I start question my sanity.
A wild thinking would be whether in that case the interrupted spinlock may miss
a wakeup forever when the unlocker only can check for the toplevel. Hm, but that
should be easy to rule out by just adding an error to spin_unlock_slow when it
fails to find anything...



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18 12:43                     ` Stefan Bader
@ 2012-10-18 20:52                       ` Stefan Bader
  2012-10-19  7:10                         ` Stefan Bader
  2012-10-19  8:06                         ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Stefan Bader @ 2012-10-18 20:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel@lists.xen.org, Ian Campbell,
	Konrad Rzeszutek Wilk


[-- Attachment #1.1.1: Type: text/plain, Size: 5934 bytes --]

On 18.10.2012 14:43, Stefan Bader wrote:
>> Obviously when this is an acquire not disabling interrupts, and
>> an interrupt comes in while in the poll hypercall (or about to go
>> there, or just having come back from one).
>>
>> Jan
>>
> Obviously. ;) Ok, so my thinking there was ok and its one level deep max. At
> some point staring at things I start question my sanity.
> A wild thinking would be whether in that case the interrupted spinlock may miss
> a wakeup forever when the unlocker only can check for the toplevel. Hm, but that
> should be easy to rule out by just adding an error to spin_unlock_slow when it
> fails to find anything...
> 
Actually I begin to suspect that it could be possible that I just overlooked the
most obvious thing. Provoking question: are we sure we are on the same page
about the purpose of the spin_lock_flags variant of the pv lock ops interface?

I begin to suspect that it really is not for giving a chance to re-enable
interrupts. Just what it should be used for I am not clear. Anyway it seems all
other places more or less ignore the flags and map themselves back to an
ignorant version of spinlock.
Also I believe that the only high level function that would end up in passing
any flags, would be the spin_lock_irqsave one. And I am pretty sure that this
one will expect interrupts to stay disabled.

So I tried below approach and that seems to be surviving the previously breaking
testcase for much longer than anything I tried before.

-Stefan

From f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Thu, 18 Oct 2012 21:40:37 +0200
Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in
 xen_spin_lock_slow()

I am not sure what exactly the spin_lock_flags variant of the
pv-spinlocks (or even in the arch spinlocks) should be used for.
But it should not be used as an invitation to enable irqs.

The only high-level variant that seems to end up there is the
spin_lock_irqsave one and that would always be used in a context
that expects the interrupts to be disabled.
The generic paravirt-spinlock code just maps the flags variant
to the one without flags, so just do the same and get rid of
all the stuff that is not needed anymore.

This seems to be resolving a weird locking issue seen when having
a high i/o database load on a PV Xen guest with multiple (8+ in
local experiments) CPUs. Well, thinking about it a second time
it seems like one of those "how did that ever work?" cases.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/spinlock.c |   23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83e866d..3330a1d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -24,7 +24,6 @@ static struct xen_spinlock_stats
 	u32 taken_slow_nested;
 	u32 taken_slow_pickup;
 	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;

 	u64 released;
 	u32 released_slow;
@@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl,
struct xen_spinlock
 	__this_cpu_write(lock_spinners, prev);
 }

-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
+static noinline int xen_spin_lock_slow(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	struct xen_spinlock *prev;
@@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 	ADD_STATS(taken_slow_nested, prev != NULL);

 	do {
-		unsigned long flags;
-
 		/* clear pending */
 		xen_clear_irq_pending(irq);

@@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 			goto out;
 		}

-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
-
 		/*
 		 * Block until irq becomes pending.  If we're
 		 * interrupted at this point (after the trylock but
@@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
*lock, bool irq_enab
 		 */
 		xen_poll_irq(irq);

-		raw_local_irq_restore(flags);
-
 		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
 	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */

@@ -270,7 +259,7 @@ out:
 	return ret;
 }

-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
+static inline void __xen_spin_lock(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	unsigned timeout;
@@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock
*lock, bool irq_enable)
 		spin_time_accum_spinning(start_spin_fast);

 	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
+			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock))));

 	spin_time_accum_total(start_spin);
 }

 static void xen_spin_lock(struct arch_spinlock *lock)
 {
-	__xen_spin_lock(lock, false);
+	__xen_spin_lock(lock);
 }

 static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
 {
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
+	__xen_spin_lock(lock);
 }

 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
@@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void)
 			   &spinlock_stats.taken_slow_pickup);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
 			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);

 	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-- 
1.7.9.5


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch --]
[-- Type: text/x-diff; name="0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch", Size: 4418 bytes --]

From f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Thu, 18 Oct 2012 21:40:37 +0200
Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in
 xen_spin_lock_slow()

I am not sure what exactly the spin_lock_flags variant of the
pv-spinlocks (or even in the arch spinlocks) should be used for.
But it should not be used as an invitation to enable irqs.

The only high-level variant that seems to end up there is the
spinlock_irqsave one and that would always be used in a context
that expects the interrupts to be disabled.
The generic paravirt-spinlock code just maps the flags variant
to the one without flags, so just do the same and get rid of
all the stuff that is not needed anymore.

This seems to be resolving a weird locking issue seen when having
a high i/o database load on a PV Xen guest with multiple (8+ in
local experiments) CPUs. Well, thinking about it a second time
it seems like one of those "how did that ever work?" cases.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/spinlock.c |   23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83e866d..3330a1d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -24,7 +24,6 @@ static struct xen_spinlock_stats
 	u32 taken_slow_nested;
 	u32 taken_slow_pickup;
 	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;
 
 	u64 released;
 	u32 released_slow;
@@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock
 	__this_cpu_write(lock_spinners, prev);
 }
 
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
+static noinline int xen_spin_lock_slow(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	struct xen_spinlock *prev;
@@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 	ADD_STATS(taken_slow_nested, prev != NULL);
 
 	do {
-		unsigned long flags;
-
 		/* clear pending */
 		xen_clear_irq_pending(irq);
 
@@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 			goto out;
 		}
 
-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
-
 		/*
 		 * Block until irq becomes pending.  If we're
 		 * interrupted at this point (after the trylock but
@@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
 		 */
 		xen_poll_irq(irq);
 
-		raw_local_irq_restore(flags);
-
 		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
 	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
 
@@ -270,7 +259,7 @@ out:
 	return ret;
 }
 
-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
+static inline void __xen_spin_lock(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 	unsigned timeout;
@@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
 		spin_time_accum_spinning(start_spin_fast);
 
 	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
+			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock))));
 
 	spin_time_accum_total(start_spin);
 }
 
 static void xen_spin_lock(struct arch_spinlock *lock)
 {
-	__xen_spin_lock(lock, false);
+	__xen_spin_lock(lock);
 }
 
 static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
 {
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
+	__xen_spin_lock(lock);
 }
 
 static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
@@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void)
 			   &spinlock_stats.taken_slow_pickup);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
 			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);
 
 	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-- 
1.7.9.5


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18 20:52                       ` Stefan Bader
@ 2012-10-19  7:10                         ` Stefan Bader
  2012-10-19  8:06                         ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Bader @ 2012-10-19  7:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 833 bytes --]

On 18.10.2012 22:52, Stefan Bader wrote:
> On 18.10.2012 14:43, Stefan Bader wrote:

> So I tried below approach and that seems to be surviving the previously breaking
> testcase for much longer than anything I tried before.
> 
> -Stefan
> 
> From f2ebb6626f3e3a00932bf1f4f75265f826c7fba9 Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Thu, 18 Oct 2012 21:40:37 +0200
> Subject: [PATCH 1/2] xen/pv-spinlock: Never enable interrupts in
>  xen_spin_lock_slow()
> 

Testcase ran the whole night without problems. Now re-started it with a cleaner
kernel (only the patch above applied and no debugging and it has again passed
the threshold it would have broken before.

-Stefan

Oh, just in case this *is* the solution, it should very definitely be marked
with a cc: stable.



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-18 20:52                       ` Stefan Bader
  2012-10-19  7:10                         ` Stefan Bader
@ 2012-10-19  8:06                         ` Jan Beulich
  2012-10-19  8:33                           ` Stefan Bader
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-19  8:06 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 18.10.12 at 22:52, Stefan Bader <stefan.bader@canonical.com> wrote:
> Actually I begin to suspect that it could be possible that I just overlooked 
> the
> most obvious thing. Provoking question: are we sure we are on the same page
> about the purpose of the spin_lock_flags variant of the pv lock ops 
> interface?
> 
> I begin to suspect that it really is not for giving a chance to re-enable
> interrupts. Just what it should be used for I am not clear. Anyway it seems 
> all
> other places more or less ignore the flags and map themselves back to an
> ignorant version of spinlock.
> Also I believe that the only high level function that would end up in 
> passing
> any flags, would be the spin_lock_irqsave one. And I am pretty sure that 
> this
> one will expect interrupts to stay disabled.

No - the only requirement here is that from the point on where
the lock is owned interrupt must remain disabled. Re-enabling
intermediately is quite okay (and used to be done by the
native kernel prior to the conversion to ticket locks iirc).

> So I tried below approach and that seems to be surviving the previously 
> breaking
> testcase for much longer than anything I tried before.

If that indeed fixes your problem, then (minus eventual problems
with the scope of the interrupts-enabled window) this rather
points at a bug in the users of the spinlock interfaces.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19  8:06                         ` Jan Beulich
@ 2012-10-19  8:33                           ` Stefan Bader
  2012-10-19  9:24                             ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-19  8:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1930 bytes --]

On 19.10.2012 10:06, Jan Beulich wrote:
>>>> On 18.10.12 at 22:52, Stefan Bader <stefan.bader@canonical.com> wrote:
>> Actually I begin to suspect that it could be possible that I just overlooked 
>> the
>> most obvious thing. Provoking question: are we sure we are on the same page
>> about the purpose of the spin_lock_flags variant of the pv lock ops 
>> interface?
>>
>> I begin to suspect that it really is not for giving a chance to re-enable
>> interrupts. Just what it should be used for I am not clear. Anyway it seems 
>> all
>> other places more or less ignore the flags and map themselves back to an
>> ignorant version of spinlock.
>> Also I believe that the only high level function that would end up in 
>> passing
>> any flags, would be the spin_lock_irqsave one. And I am pretty sure that 
>> this
>> one will expect interrupts to stay disabled.
> 
> No - the only requirement here is that from the point on where
> the lock is owned interrupt must remain disabled. Re-enabling
> intermediately is quite okay (and used to be done by the
> native kernel prior to the conversion to ticket locks iirc).

Though it seems rather dangerous then. Don't remember the old code, but imo it
always opens up a (even microscopic) window to unexpected interruptions.

> 
>> So I tried below approach and that seems to be surviving the previously 
>> breaking
>> testcase for much longer than anything I tried before.
> 
> If that indeed fixes your problem, then (minus eventual problems
> with the scope of the interrupts-enabled window) this rather
> points at a bug in the users of the spinlock interfaces.

I would be pragmatic here, none of the other current implementations seem to
re-enable interrupts and so this only affects xen pv. And how much really is
gained from enabling it compared to the risk of being affected by something that
nobody else will be?

> 
> Jan
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19  8:33                           ` Stefan Bader
@ 2012-10-19  9:24                             ` Jan Beulich
  2012-10-19 14:03                               ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-19  9:24 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 19.10.12 at 10:33, Stefan Bader <stefan.bader@canonical.com> wrote:
> On 19.10.2012 10:06, Jan Beulich wrote:
>>>>> On 18.10.12 at 22:52, Stefan Bader <stefan.bader@canonical.com> wrote:
>>> Actually I begin to suspect that it could be possible that I just overlooked 
> 
>>> the
>>> most obvious thing. Provoking question: are we sure we are on the same page
>>> about the purpose of the spin_lock_flags variant of the pv lock ops 
>>> interface?
>>>
>>> I begin to suspect that it really is not for giving a chance to re-enable
>>> interrupts. Just what it should be used for I am not clear. Anyway it seems 
>>> all
>>> other places more or less ignore the flags and map themselves back to an
>>> ignorant version of spinlock.
>>> Also I believe that the only high level function that would end up in 
>>> passing
>>> any flags, would be the spin_lock_irqsave one. And I am pretty sure that 
>>> this
>>> one will expect interrupts to stay disabled.
>> 
>> No - the only requirement here is that from the point on where
>> the lock is owned interrupt must remain disabled. Re-enabling
>> intermediately is quite okay (and used to be done by the
>> native kernel prior to the conversion to ticket locks iirc).
> 
> Though it seems rather dangerous then. Don't remember the old code, but imo 
> it
> always opens up a (even microscopic) window to unexpected interruptions.

There just can't be unexpected interruptions. Whenever interrupts
are enabled, it is expected that they can occur.

>>> So I tried below approach and that seems to be surviving the previously 
>>> breaking
>>> testcase for much longer than anything I tried before.
>> 
>> If that indeed fixes your problem, then (minus eventual problems
>> with the scope of the interrupts-enabled window) this rather
>> points at a bug in the users of the spinlock interfaces.
> 
> I would be pragmatic here, none of the other current implementations seem to
> re-enable interrupts and so this only affects xen pv.

I don't think you really checked - the first arch I looked at (s390,
as being the most obvious one to look at when it comes to
virtualization) quite prominently re-enableds interrupts in
arch_spin_lock_wait_flags().

> And how much really is
> gained from enabling it compared to the risk of being affected by something 
> that nobody else will be?

The main difference between the native and virtualized cases is
that the period of time you spend waiting for the lock to become
available is pretty much unbounded (even more so without ticket
locks), and keeping interrupts disabled for such an extended
period of time is just going to ask for other problems.

And note that this isn't the case just for Xen PV - all virtualization
scenarios suffer from that.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19  9:24                             ` Jan Beulich
@ 2012-10-19 14:03                               ` Stefan Bader
  2012-10-19 14:49                                 ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-19 14:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel@lists.xen.org, Ian Campbell,
	Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 4139 bytes --]

On 19.10.2012 11:24, Jan Beulich wrote:
>>>> On 19.10.12 at 10:33, Stefan Bader <stefan.bader@canonical.com> wrote:
>> On 19.10.2012 10:06, Jan Beulich wrote:
>>>>>> On 18.10.12 at 22:52, Stefan Bader <stefan.bader@canonical.com>
>>>>>> wrote:
>>>> Actually I begin to suspect that it could be possible that I just
>>>> overlooked
>> 
>>>> the most obvious thing. Provoking question: are we sure we are on the
>>>> same page about the purpose of the spin_lock_flags variant of the pv
>>>> lock ops interface?
>>>> 
>>>> I begin to suspect that it really is not for giving a chance to
>>>> re-enable interrupts. Just what it should be used for I am not clear.
>>>> Anyway it seems all other places more or less ignore the flags and map
>>>> themselves back to an ignorant version of spinlock. Also I believe that
>>>> the only high level function that would end up in passing any flags,
>>>> would be the spin_lock_irqsave one. And I am pretty sure that this one
>>>> will expect interrupts to stay disabled.
>>> 
>>> No - the only requirement here is that from the point on where the lock
>>> is owned interrupt must remain disabled. Re-enabling intermediately is
>>> quite okay (and used to be done by the native kernel prior to the
>>> conversion to ticket locks iirc).
>> 
>> Though it seems rather dangerous then. Don't remember the old code, but imo
>>  it always opens up a (even microscopic) window to unexpected
>> interruptions.
> 
> There just can't be unexpected interruptions. Whenever interrupts are
> enabled, it is expected that they can occur.

Probably one thing that makes things a bit more complicated is that in the PVM
case interrupts maps to vcpu->evtchn_upcall_mask.

> 
>>>> So I tried below approach and that seems to be surviving the previously
>>>>  breaking testcase for much longer than anything I tried before.
>>> 
>>> If that indeed fixes your problem, then (minus eventual problems with the
>>> scope of the interrupts-enabled window) this rather points at a bug in
>>> the users of the spinlock interfaces.
>> 
>> I would be pragmatic here, none of the other current implementations seem
>> to re-enable interrupts and so this only affects xen pv.
> 
> I don't think you really checked - the first arch I looked at (s390, as being
> the most obvious one to look at when it comes to virtualization) quite
> prominently re-enableds interrupts in arch_spin_lock_wait_flags().

No, I assumed that you saying native kernel did prior to ticket lock conversion,
that this involves more historic search. And yes s390 is doing virtualization
quite a bit back into history. Just not paravirtualization.
And when I look at arch_spin_lock_wait_flags() enabling/disabling is done close
by (at least I am not leaving off into some hypercall fog).

> 
>> And how much really is gained from enabling it compared to the risk of
>> being affected by something that nobody else will be?
> 
> The main difference between the native and virtualized cases is that the
> period of time you spend waiting for the lock to become available is pretty
> much unbounded (even more so without ticket locks), and keeping interrupts
> disabled for such an extended period of time is just going to ask for other
> problems.

So not sure I followed all of the right paths here, but I think xen_poll_irq
ends up doing a hypercall via syscall. Syscall seems to mask of some bits of the
flags (maybe irq) but certainly that will not translate automatically into the
upcall mask.
Then, again hopefully the right place, in the mini-os part, the
hypervisor_callback there is some annotation about leaving the events masked off
until having checked for being already interrupted. Could be the same mask that
is checked here that the guest has cleared to enable interrupts... Would that be
expected?

> 
> And note that this isn't the case just for Xen PV - all virtualization 
> scenarios suffer from that.
> 
> Jan
> 
> 
> _______________________________________________ Xen-devel mailing list 
> Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19 14:03                               ` Stefan Bader
@ 2012-10-19 14:49                                 ` Jan Beulich
  2012-10-19 14:57                                   ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-19 14:49 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 19.10.12 at 16:03, Stefan Bader <stefan.bader@canonical.com> wrote:
> On 19.10.2012 11:24, Jan Beulich wrote:
>>>>> On 19.10.12 at 10:33, Stefan Bader <stefan.bader@canonical.com> wrote:
>>> I would be pragmatic here, none of the other current implementations seem
>>> to re-enable interrupts and so this only affects xen pv.
>> 
>> I don't think you really checked - the first arch I looked at (s390, as being
>> the most obvious one to look at when it comes to virtualization) quite
>> prominently re-enableds interrupts in arch_spin_lock_wait_flags().
> 
> No, I assumed that you saying native kernel did prior to ticket lock 
> conversion,

I indeed did say that. Just go look back at the old code.

Jan

> that this involves more historic search. And yes s390 is doing 
> virtualization
> quite a bit back into history. Just not paravirtualization.
> And when I look at arch_spin_lock_wait_flags() enabling/disabling is done 
> close
> by (at least I am not leaving off into some hypercall fog).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19 14:49                                 ` Jan Beulich
@ 2012-10-19 14:57                                   ` Stefan Bader
  2012-10-19 15:08                                     ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-19 14:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel@lists.xen.org, Ian Campbell,
	Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 423 bytes --]

On 19.10.2012 16:49, Jan Beulich wrote:

>> No, I assumed that you saying native kernel did prior to ticket lock 
>> conversion,
> 
> I indeed did say that. Just go look back at the old code.

No, sorry, but no. Just that old code did it does not help me here. I di believe
you there. Still it seems there is something wrong doing it there in a pv guest.
Though I cannot give a satisfying explanation, yet.




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19 14:57                                   ` Stefan Bader
@ 2012-10-19 15:08                                     ` Jan Beulich
  2012-10-19 15:21                                       ` Stefan Bader
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2012-10-19 15:08 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 19.10.12 at 16:57, Stefan Bader <stefan.bader@canonical.com> wrote:
> On 19.10.2012 16:49, Jan Beulich wrote:
> 
>>> No, I assumed that you saying native kernel did prior to ticket lock 
>>> conversion,
>> 
>> I indeed did say that. Just go look back at the old code.
> 
> No, sorry, but no. Just that old code did it does not help me here. I di 
> believe
> you there. Still it seems there is something wrong doing it there in a pv 
> guest.

PV guest may be a little broad - did you observe such a problem
with a recent kernel of ours?

> Though I cannot give a satisfying explanation, yet.

But that's going to be necessary in order to judge whether the
brute force approach you take is really the only alternative (and
whether it's fixing the problem rather than just making it less
likely).

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19 15:08                                     ` Jan Beulich
@ 2012-10-19 15:21                                       ` Stefan Bader
  2012-10-19 15:33                                         ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Bader @ 2012-10-19 15:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1368 bytes --]

On 19.10.2012 17:08, Jan Beulich wrote:
>>>> On 19.10.12 at 16:57, Stefan Bader <stefan.bader@canonical.com> wrote:
>> On 19.10.2012 16:49, Jan Beulich wrote:
>>
>>>> No, I assumed that you saying native kernel did prior to ticket lock 
>>>> conversion,
>>>
>>> I indeed did say that. Just go look back at the old code.
>>
>> No, sorry, but no. Just that old code did it does not help me here. I di 
>> believe
>> you there. Still it seems there is something wrong doing it there in a pv 
>> guest.
> 
> PV guest may be a little broad - did you observe such a problem
> with a recent kernel of ours?

3.2 (ok, older) and 3.5 up to now, certainly can and should retry with at least
3.6. And on the host side it was on Amazon EC2 and locally with a 4.1.2 host.

> 
>> Though I cannot give a satisfying explanation, yet.
> 
> But that's going to be necessary in order to judge whether the
> brute force approach you take is really the only alternative (and
> whether it's fixing the problem rather than just making it less
> likely).

I know. Unfortunately that means digging quite deeply in the not-so-well-known.
And it does not help either that the backtraces in the dump seem not completely
trustworthy. Like vcpus that the host says are polling on a certain eventchannel
do not seem to be sitting on the hypercall.
> 
> Jan
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Xen PVM: Strange lockups when running PostgreSQL load
  2012-10-19 15:21                                       ` Stefan Bader
@ 2012-10-19 15:33                                         ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2012-10-19 15:33 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 19.10.12 at 17:21, Stefan Bader <stefan.bader@canonical.com> wrote:
> On 19.10.2012 17:08, Jan Beulich wrote:
>>>>> On 19.10.12 at 16:57, Stefan Bader <stefan.bader@canonical.com> wrote:
>>> On 19.10.2012 16:49, Jan Beulich wrote:
>>>
>>>>> No, I assumed that you saying native kernel did prior to ticket lock 
>>>>> conversion,
>>>>
>>>> I indeed did say that. Just go look back at the old code.
>>>
>>> No, sorry, but no. Just that old code did it does not help me here. I di 
>>> believe
>>> you there. Still it seems there is something wrong doing it there in a pv 
>>> guest.
>> 
>> PV guest may be a little broad - did you observe such a problem
>> with a recent kernel of ours?
> 
> 3.2 (ok, older) and 3.5 up to now, certainly can and should retry with at 
> least
> 3.6. And on the host side it was on Amazon EC2 and locally with a 4.1.2 
> host.

Anything 3.0 and newer is more or less the same spin lock wise; did
you ever open a bug report for this?

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-10-19 15:33 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).