xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load
Date: Thu, 18 Oct 2012 12:20:14 +0200	[thread overview]
Message-ID: <507FD7DE.2010209@canonical.com> (raw)
In-Reply-To: <1350546483.28188.25.camel@dagon.hellion.org.uk>


[-- Attachment #1.1: Type: text/plain, Size: 3423 bytes --]

On 18.10.2012 09:48, Ian Campbell wrote:
> On Thu, 2012-10-18 at 08:38 +0100, Stefan Bader wrote:
>> On 18.10.2012 09:08, Jan Beulich wrote:
>>>>>> On 18.10.12 at 09:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>>> On 17.10.12 at 17:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> In each case, the event channels are masked (no surprise given the
>>>>> conversation so far on this thread), and have no pending events. 
>>>>> Therefore, I believe we are looking at the same bug.
>>>>
>>>> That seems very unlikely (albeit not impossible) to me, given that
>>>> the non-pvops kernel uses ticket locks while the pvops one doesn't.
>>>
>>> And in fact we had a similar problem with our original ticket lock
>>> implementation, exposed by an open coded lock in the scheduler's
>>> run queue management. But that was really ticket lock specific,
>>> in that the fact that a CPU could passively become the owner of
>>> a lock while polling - that's impossible with pvops' byte locks afaict.
>>
>> One of the trains of thought I had was whether it could happen that a cpu is in
>> polling and the task gets moved. But I don't think it can happen as the
>> hypercall unlikely is a place where any schedule happens (preempt is none). And
>> it would be much more common...
>>
>> One detail which I hope someone can fill in is the whole "interrupted spinlock"
>> thing. Saving the last lock pointer stored on the per-cpu lock_spinners and so
>> on. Is that really only for spinlocks taken without interrupts disabled or do I
>> miss something there?
> 
> spinning_lock() returns the old lock which the caller is expected to
> remember and replace via unspinning_lock() -- it effectively implements
> a stack of locks which are being waited on. xen_spin_lock_slow (the only
> caller0 appears to do this correctly from a brief inspection.

Yes, just *when* can there be a stack of locks (spinlocks). The poll_irq
hypercall seems to be an active (in the sense of not preemting to another task)
process. How could there be a situation that another lock (on the same cpu is
tried to be taken).
> 
> Is there any chance this is just a simple AB-BA or similar type
> deadlock? Do we have data which suggests all vCPUs are waiting on the
> same lock or just that they are waiting on some lock? I suppose lockdep
> (which I think you mentioned before?) would have caught this, unless pv
> locks somehow confound it?

The one situation where I went deeper into the tasks that appeared to be on a
cpu it was one waiting for signalling a task that looked to be just scheduled
out and the cpu it was running on doing a idle balance that waited on the lock
for cpu#0's runqueue. Which cpu#0 itself seemed to be waiting slow (the lock
pointer was on lock_spinners[0]) but the lock itself was 0.
Though there is a chance that this is always just a coincidental state where the
lock just was released and more related to how the Xen stack does a guest dump.
So it would be to find who holds the other lock.
Unfortunately at least a full lock debugging enabled kernel is sufficiently
different in timing that I cannot reproduce the issue on a test machine. And
from reported crashes in production I have no data.

> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2012-10-18 10:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader [this message]
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507FD7DE.2010209@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).