All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load
Date: Thu, 18 Oct 2012 12:20:14 +0200	[thread overview]
Message-ID: <507FD7DE.2010209@canonical.com> (raw)
In-Reply-To: <1350546483.28188.25.camel@dagon.hellion.org.uk>


[-- Attachment #1.1: Type: text/plain, Size: 3423 bytes --]

On 18.10.2012 09:48, Ian Campbell wrote:
> On Thu, 2012-10-18 at 08:38 +0100, Stefan Bader wrote:
>> On 18.10.2012 09:08, Jan Beulich wrote:
>>>>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> In each case, the event channels are masked (no surprise given the
>>>>> conversation so far on this thread), and have no pending events. 
>>>>> Therefore, I believe we are looking at the same bug.
>>>>
>>>> That seems very unlikely (albeit not impossible) to me, given that
>>>> the non-pvops kernel uses ticket locks while the pvops one doesn't.
>>>
>>> And in fact we had a similar problem with our original ticket lock
>>> implementation, exposed by an open coded lock in the scheduler's
>>> run queue management. But that was really ticket lock specific,
>>> in that the fact that a CPU could passively become the owner of
>>> a lock while polling - that's impossible with pvops' byte locks afaict.
>>
>> One of the trains of thought I had was whether it could happen that a cpu is in
>> polling and the task gets moved. But I don't think it can happen as the
>> hypercall unlikely is a place where any schedule happens (preempt is none). And
>> it would be much more common...
>>
>> One detail which I hope someone can fill in is the whole "interrupted spinlock"
>> thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
>> on. Is that really only for spinlocks taken without interrupts disabled or do I
>> miss something there?
> 
> spinning_lock() returns the old lock which the caller is expected to
> remember and replace via unspinning_lock() -- it effectively implements
> a stack of locks which are being waited on. xen_spin_lock_slow (the only
> caller0 appears to do this correctly from a brief inspection.

Yes, just *when* can there be a stack of locks (spinlocks). The poll_irq
hypercall seems to be an active (in the sense of not preemting to another task)
process. How could there be a situation that another lock (on the same cpu is
tried to be taken).
> 
> Is there any chance this is just a simple AB-BA or similar type
> deadlock? Do we have data which suggests all vCPUs are waiting on the
> same lock or just that they are waiting on some lock? I suppose lockdep
> (which I think you mentioned before?) would have caught this, unless pv
> locks somehow confound it?

The one situation where I went deeper into the tasks that appeared to be on a
cpu it was one waiting for signalling a task that looked to be just scheduled
out and the cpu it was running on doing a idle balance that waited on the lock
for cpu#0's runqueue. Which cpu#0 itself seemed to be waiting slow (the lock
pointer was on lock_spinners[0]) but the lock itself was 0.
Though there is a chance that this is always just a coincidental state where the
lock just was released and more related to how the Xen stack does a guest dump.
So it would be to find who holds the other lock.
Unfortunately at least a full lock debugging enabled kernel is sufficiently
different in timing that I cannot reproduce the issue on a test machine. And
from reported crashes in production I have no data.

> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2012-10-18 10:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader [this message]
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507FD7DE.2010209@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.