Re: Xen PVM: Strange lockups when running PostgreSQL load

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Stefan Bader <stefan.bader@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load
Date: Wed, 17 Oct 2012 16:35:20 +0100	[thread overview]
Message-ID: <507ED038.8000806@citrix.com> (raw)
In-Reply-To: <507ECD06.2050407@canonical.com>


[-- Attachment #1.1: Type: text/plain, Size: 8551 bytes --]


On 17/10/12 16:21, Stefan Bader wrote:
> On 17.10.2012 15:55, Ian Campbell wrote:
>> On Wed, 2012-10-17 at 14:28 +0100, Andrew Cooper wrote:
>>> In our case, certain processes were locking up, and it turned out that
>>> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
>>> your launchpad ticket) on its own spinlock IPI event channel
>>> (understandable, as its a spinlock), but with the event channel
>>> masked, so it would never wake up from the poll.
>>
>> I think (but you might want to check) that SCHEDOP_poll works (or is
>> supposed to work!) regardless of whether the specific evtchn you ask for
>> is masked or not.
>
> I was assuming it to be supposed to work at least in Xen 4.1.2. Or at
least if
> it did not I would hope to catch VCPUs rather sitting on the hypercall
than
> doing nothing. Of course I cannot say how reliable information in
crash is as
> this is something new to do after Daniel fixed crash.
>
>>
>> The Linux PV spinlock implementation takes advantage of this because it
>> never wants to take a real interrupt from the spinlock poller evtchn.
>
> Right, I probably should have realized that. Though I guess it is still
> interesting to see whether the channel is pending.
>
> So when just recreating, I got the q and e info which is (assuming the
guest
> domain is enough):
>
> (XEN) Event channel information for domain 1:
> (XEN) Polling vCPUs: {1,4,6}
> (XEN) port [p/m]
> (XEN) 1 [0/0]: s=3 n=0 d=0 p=68 x=0
> (XEN) 2 [1/0]: s=3 n=0 d=0 p=69 x=0
> (XEN) 3 [1/0]: s=5 n=0 v=0 x=0
> (XEN) 4 [1/1]: s=6 n=0 x=0
> (XEN) 5 [1/0]: s=6 n=0 x=0
> (XEN) 6 [0/0]: s=6 n=0 x=0
> (XEN) 7 [0/0]: s=5 n=0 v=1 x=0
> (XEN) 8 [0/0]: s=6 n=0 x=0
> (XEN) 9 [1/0]: s=5 n=1 v=0 x=0
> (XEN) 10 [0/1]: s=6 n=1 x=0
> (XEN) 11 [1/0]: s=6 n=1 x=0
> (XEN) 12 [0/0]: s=6 n=1 x=0
> (XEN) 13 [0/0]: s=5 n=1 v=1 x=0
> (XEN) 14 [0/0]: s=6 n=1 x=0
> (XEN) 15 [0/0]: s=5 n=2 v=0 x=0
> (XEN) 16 [1/1]: s=6 n=2 x=0
> (XEN) 17 [0/0]: s=6 n=2 x=0
> (XEN) 18 [0/0]: s=6 n=2 x=0
> (XEN) 19 [0/0]: s=5 n=2 v=1 x=0
> (XEN) 20 [0/0]: s=6 n=2 x=0
> (XEN) 21 [0/0]: s=5 n=3 v=0 x=0
> (XEN) 22 [1/1]: s=6 n=3 x=0
> (XEN) 23 [0/0]: s=6 n=3 x=0
> (XEN) 24 [0/0]: s=6 n=3 x=0
> (XEN) 25 [0/0]: s=5 n=3 v=1 x=0
> (XEN) 26 [0/0]: s=6 n=3 x=0
> (XEN) 27 [1/0]: s=5 n=4 v=0 x=0
> (XEN) 28 [0/1]: s=6 n=4 x=0
> (XEN) 29 [1/0]: s=6 n=4 x=0
> (XEN) 30 [0/0]: s=6 n=4 x=0
> (XEN) 31 [0/0]: s=5 n=4 v=1 x=0
> (XEN) 32 [0/0]: s=6 n=4 x=0
> (XEN) 33 [0/0]: s=5 n=5 v=0 x=0
> (XEN) 34 [0/1]: s=6 n=5 x=0
> (XEN) 35 [0/0]: s=6 n=5 x=0
> (XEN) 36 [0/0]: s=6 n=5 x=0
> (XEN) 37 [0/0]: s=5 n=5 v=1 x=0
> (XEN) 38 [0/0]: s=6 n=5 x=0
> (XEN) 39 [1/0]: s=5 n=6 v=0 x=0
> (XEN) 40 [0/1]: s=6 n=6 x=0
> (XEN) 41 [1/0]: s=6 n=6 x=0
> (XEN) 42 [0/0]: s=6 n=6 x=0
> (XEN) 43 [0/0]: s=5 n=6 v=1 x=0
> (XEN) 44 [0/0]: s=6 n=6 x=0event channel
> (XEN) 45 [0/0]: s=5 n=7 v=0 x=0
> (XEN) 46 [1/1]: s=6 n=7 x=0
> (XEN) 47 [0/0]: s=6 n=7 x=0
> (XEN) 48 [0/0]: s=6 n=7 x=0
> (XEN) 49 [0/0]: s=5 n=7 v=1 x=0
> (XEN) 50 [0/0]: s=6 n=7 x=0
> (XEN) 51 [0/0]: s=3 n=7 d=0 p=70 x=0
> (XEN) 52 [0/0]: s=3 n=0 d=0 p=71 x=0
> (XEN) 53 [0/0]: s=3 n=0 d=0 p=72 x=0
> (XEN) 54 [0/0]: s=3 n=0 d=0 p=73 x=0
> (XEN) 55 [1/0]: s=3 n=0 d=0 p=74 x=0
>
> [maybe someone can tell me what the s,n,d,p and x mean]

s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
pirq, 5 = virq, 6 = ipi
n = target vcpu id to notify
x = boolean indicating whether xen is a consumer of the event channel or
not.

d = target domain (when appropriate)  In this case, p is the target port.

>
>
> (XEN) Rangesets belonging to domain 1:
> (XEN) I/O Ports { }
> (XEN) Interrupts { }
> (XEN) I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN) DomPage list too long to display
> (XEN) XenPage 00000000008000ab: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000aa: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a9: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a8: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 00000000000dfae4: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 1:
> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
= 01
> dirty_cpus={3} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU2: CPU4 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU3: CPU5 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU5: CPU7 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={7} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU7: CPU6 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 01
> dirty_cpus={6} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 5, stat 0/0/0)
> (XEN) Notifying guest 0:1 (virq 1, port 11, stat 0/0/0)
> (XEN) Notifying guest 0:2 (virq 1, port 17, stat 0/0/0)
> (XEN) Notifying guest 0:3 (virq 1, port 23, stat 0/0/0)
> (XEN) Notifying guest 0:4 (virq 1, port 29, stat 0/0/0)
> (XEN) Notifying guest 0:5 (virq 1, port 35, stat 0/0/0)
> (XEN) Notifying guest 0:6 (virq 1, port 41, stat 0/0/0)
> (XEN) Notifying guest 0:7 (virq 1, port 47, stat 0/0/0)
> (XEN) Notifying guest 1:0 (virq 1, port 7, stat 0/0/-1)
> (XEN) Notifying guest 1:1 (virq 1, port 13, stat 0/0/-1)
> (XEN) Notifying guest 1:2 (virq 1, port 19, stat 0/0/0)
> (XEN) Notifying guest 1:3 (virq 1, port 25, stat 0/0/0)
> (XEN) Notifying guest 1:4 (virq 1, port 31, stat 0/0/-1)
> (XEN) Notifying guest 1:5 (virq 1, port 37, stat 0/0/0)
> (XEN) Notifying guest 1:6 (virq 1, port 43, stat 0/0/-1)
> (XEN) Notifying guest 1:7 (virq 1, port 49, stat 0/0/0)

So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
channel for itself.

Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.

In each case, the event channels are masked (no surprise given the
conversation so far on this thread), and have no pending events. 
Therefore, I believe we are looking at the same bug.


>
>
> When the guest was unresponsive the console would still show:
> [10174.372092] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 1}
(detected
> by 4, t=15002 jiffies)
> [10284.448089] INFO: rcu_bh detected stalls on CPUs/tasks: { 0 1 4 6}
(detected
> by 5, t=15004 jiffies)
>
> in a repeating pattern. So I take the above as cpus 1,4 and 6 are
polling. From
> the dump and the content of lock_spinners I get:
>
> cpu 0 and 1 -> ffff8803bfc13700 (which is runqueue[0] and is unlocked
again)
> cpu 4 and 6 -> ffffffff81f15ef0 (which is blkif_io_lock and is locked)

I wonder if there is possibly a race condition between notifying that a
lock has been unlocked, and another vcpu trying to poll after deciding
that the lock is locked.

The other option is that there is a bug in working out which event
channel to notify when a lock is unlocked.

~Andrew

>
>
> Backtraces would be somewhat inconsistent (as always). Note, I should
mention
> that I still had a kernel with my patch applied on that guest. That
changes
> things a bit (actually it takes a bit longer to hang but again that
might be
> just a matter of timing). The strange lock state of 2 spinners on an
unlocked
> lock remains the same with or without it.
>
> One question about the patch actually, would anybody think that there
could be a
> case where the unlocking cpu has itself on the spinners list? I did
not think so
> but that might be wrong.
>>
>> The IRQ handler for the spinlock evtchn in Linux is:
>> static irqreturn_t dummy_handler(int irq, void *dev_id)
>> {
>> BUG();
>> return IRQ_HANDLED;
>> }
>>
>> and right after we register it:
>> disable_irq(irq); /* make sure it's never delivered */
>>
>> The is no enable -- ignoring bugs of which there have been couple of
>> instances, but those trigger the BUG() so are pretty obvious.
>>
>> Ian.
>>
>>
>
>
>
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #1.2: Type: text/html, Size: 11966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2012-10-17 15:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper [this message]
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507ED038.8000806@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=stefan.bader@canonical.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).