Re: Xen PVM: Strange lockups when running PostgreSQL load

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Stefan Bader <stefan.bader@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen PVM: Strange lockups when running PostgreSQL load
Date: Wed, 17 Oct 2012 16:35:20 +0100	[thread overview]
Message-ID: <507ED038.8000806@citrix.com> (raw)
In-Reply-To: <507ECD06.2050407@canonical.com>


[-- Attachment #1.1: Type: text/plain, Size: 8551 bytes --]


On 17/10/12 16:21, Stefan Bader wrote:
> On 17.10.2012 15:55, Ian Campbell wrote:
>> On Wed, 2012-10-17 at 14:28 +0100, Andrew Cooper wrote:
>>> In our case, certain processes were locking up, and it turned out that
>>> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
>>> your launchpad ticket) on its own spinlock IPI event channel
>>> (understandable, as its a spinlock), but with the event channel
>>> masked, so it would never wake up from the poll.
>>
>> I think (but you might want to check) that SCHEDOP_poll works (or is
>> supposed to work!) regardless of whether the specific evtchn you ask for
>> is masked or not.
>
> I was assuming it to be supposed to work at least in Xen 4.1.2. Or at
least if
> it did not I would hope to catch VCPUs rather sitting on the hypercall
than
> doing nothing. Of course I cannot say how reliable information in
crash is as
> this is something new to do after Daniel fixed crash.
>
>>
>> The Linux PV spinlock implementation takes advantage of this because it
>> never wants to take a real interrupt from the spinlock poller evtchn.
>
> Right, I probably should have realized that. Though I guess it is still
> interesting to see whether the channel is pending.
>
> So when just recreating, I got the q and e info which is (assuming the
guest
> domain is enough):
>
> (XEN) Event channel information for domain 1:
> (XEN) Polling vCPUs: {1,4,6}
> (XEN) port [p/m]
> (XEN) 1 [0/0]: s=3 n=0 d=0 p=68 x=0
> (XEN) 2 [1/0]: s=3 n=0 d=0 p=69 x=0
> (XEN) 3 [1/0]: s=5 n=0 v=0 x=0
> (XEN) 4 [1/1]: s=6 n=0 x=0
> (XEN) 5 [1/0]: s=6 n=0 x=0
> (XEN) 6 [0/0]: s=6 n=0 x=0
> (XEN) 7 [0/0]: s=5 n=0 v=1 x=0
> (XEN) 8 [0/0]: s=6 n=0 x=0
> (XEN) 9 [1/0]: s=5 n=1 v=0 x=0
> (XEN) 10 [0/1]: s=6 n=1 x=0
> (XEN) 11 [1/0]: s=6 n=1 x=0
> (XEN) 12 [0/0]: s=6 n=1 x=0
> (XEN) 13 [0/0]: s=5 n=1 v=1 x=0
> (XEN) 14 [0/0]: s=6 n=1 x=0
> (XEN) 15 [0/0]: s=5 n=2 v=0 x=0
> (XEN) 16 [1/1]: s=6 n=2 x=0
> (XEN) 17 [0/0]: s=6 n=2 x=0
> (XEN) 18 [0/0]: s=6 n=2 x=0
> (XEN) 19 [0/0]: s=5 n=2 v=1 x=0
> (XEN) 20 [0/0]: s=6 n=2 x=0
> (XEN) 21 [0/0]: s=5 n=3 v=0 x=0
> (XEN) 22 [1/1]: s=6 n=3 x=0
> (XEN) 23 [0/0]: s=6 n=3 x=0
> (XEN) 24 [0/0]: s=6 n=3 x=0
> (XEN) 25 [0/0]: s=5 n=3 v=1 x=0
> (XEN) 26 [0/0]: s=6 n=3 x=0
> (XEN) 27 [1/0]: s=5 n=4 v=0 x=0
> (XEN) 28 [0/1]: s=6 n=4 x=0
> (XEN) 29 [1/0]: s=6 n=4 x=0
> (XEN) 30 [0/0]: s=6 n=4 x=0
> (XEN) 31 [0/0]: s=5 n=4 v=1 x=0
> (XEN) 32 [0/0]: s=6 n=4 x=0
> (XEN) 33 [0/0]: s=5 n=5 v=0 x=0
> (XEN) 34 [0/1]: s=6 n=5 x=0
> (XEN) 35 [0/0]: s=6 n=5 x=0
> (XEN) 36 [0/0]: s=6 n=5 x=0
> (XEN) 37 [0/0]: s=5 n=5 v=1 x=0
> (XEN) 38 [0/0]: s=6 n=5 x=0
> (XEN) 39 [1/0]: s=5 n=6 v=0 x=0
> (XEN) 40 [0/1]: s=6 n=6 x=0
> (XEN) 41 [1/0]: s=6 n=6 x=0
> (XEN) 42 [0/0]: s=6 n=6 x=0
> (XEN) 43 [0/0]: s=5 n=6 v=1 x=0
> (XEN) 44 [0/0]: s=6 n=6 x=0event channel
> (XEN) 45 [0/0]: s=5 n=7 v=0 x=0
> (XEN) 46 [1/1]: s=6 n=7 x=0
> (XEN) 47 [0/0]: s=6 n=7 x=0
> (XEN) 48 [0/0]: s=6 n=7 x=0
> (XEN) 49 [0/0]: s=5 n=7 v=1 x=0
> (XEN) 50 [0/0]: s=6 n=7 x=0
> (XEN) 51 [0/0]: s=3 n=7 d=0 p=70 x=0
> (XEN) 52 [0/0]: s=3 n=0 d=0 p=71 x=0
> (XEN) 53 [0/0]: s=3 n=0 d=0 p=72 x=0
> (XEN) 54 [0/0]: s=3 n=0 d=0 p=73 x=0
> (XEN) 55 [1/0]: s=3 n=0 d=0 p=74 x=0
>
> [maybe someone can tell me what the s,n,d,p and x mean]

s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
pirq, 5 = virq, 6 = ipi
n = target vcpu id to notify
x = boolean indicating whether xen is a consumer of the event channel or
not.

d = target domain (when appropriate)  In this case, p is the target port.

>
>
> (XEN) Rangesets belonging to domain 1:
> (XEN) I/O Ports { }
> (XEN) Interrupts { }
> (XEN) I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN) DomPage list too long to display
> (XEN) XenPage 00000000008000ab: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000aa: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a9: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 00000000008000a8: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 00000000000dfae4: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 1:
> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
= 01
> dirty_cpus={3} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU2: CPU4 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU3: CPU5 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU5: CPU7 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00
> dirty_cpus={7} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
upcall_mask = 01
> dirty_cpus={} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) VCPU7: CPU6 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 01
> dirty_cpus={6} cpu_affinity={0-127}
> (XEN) No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 5, stat 0/0/0)
> (XEN) Notifying guest 0:1 (virq 1, port 11, stat 0/0/0)
> (XEN) Notifying guest 0:2 (virq 1, port 17, stat 0/0/0)
> (XEN) Notifying guest 0:3 (virq 1, port 23, stat 0/0/0)
> (XEN) Notifying guest 0:4 (virq 1, port 29, stat 0/0/0)
> (XEN) Notifying guest 0:5 (virq 1, port 35, stat 0/0/0)
> (XEN) Notifying guest 0:6 (virq 1, port 41, stat 0/0/0)
> (XEN) Notifying guest 0:7 (virq 1, port 47, stat 0/0/0)
> (XEN) Notifying guest 1:0 (virq 1, port 7, stat 0/0/-1)
> (XEN) Notifying guest 1:1 (virq 1, port 13, stat 0/0/-1)
> (XEN) Notifying guest 1:2 (virq 1, port 19, stat 0/0/0)
> (XEN) Notifying guest 1:3 (virq 1, port 25, stat 0/0/0)
> (XEN) Notifying guest 1:4 (virq 1, port 31, stat 0/0/-1)
> (XEN) Notifying guest 1:5 (virq 1, port 37, stat 0/0/0)
> (XEN) Notifying guest 1:6 (virq 1, port 43, stat 0/0/-1)
> (XEN) Notifying guest 1:7 (virq 1, port 49, stat 0/0/0)

So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
channel for itself.

Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.

In each case, the event channels are masked (no surprise given the
conversation so far on this thread), and have no pending events. 
Therefore, I believe we are looking at the same bug.


>
>
> When the guest was unresponsive the console would still show:
> [10174.372092] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 1}
(detected
> by 4, t=15002 jiffies)
> [10284.448089] INFO: rcu_bh detected stalls on CPUs/tasks: { 0 1 4 6}
(detected
> by 5, t=15004 jiffies)
>
> in a repeating pattern. So I take the above as cpus 1,4 and 6 are
polling. From
> the dump and the content of lock_spinners I get:
>
> cpu 0 and 1 -> ffff8803bfc13700 (which is runqueue[0] and is unlocked
again)
> cpu 4 and 6 -> ffffffff81f15ef0 (which is blkif_io_lock and is locked)

I wonder if there is possibly a race condition between notifying that a
lock has been unlocked, and another vcpu trying to poll after deciding
that the lock is locked.

The other option is that there is a bug in working out which event
channel to notify when a lock is unlocked.

~Andrew

>
>
> Backtraces would be somewhat inconsistent (as always). Note, I should
mention
> that I still had a kernel with my patch applied on that guest. That
changes
> things a bit (actually it takes a bit longer to hang but again that
might be
> just a matter of timing). The strange lock state of 2 spinners on an
unlocked
> lock remains the same with or without it.
>
> One question about the patch actually, would anybody think that there
could be a
> case where the unlocking cpu has itself on the spinners list? I did
not think so
> but that might be wrong.
>>
>> The IRQ handler for the spinlock evtchn in Linux is:
>> static irqreturn_t dummy_handler(int irq, void *dev_id)
>> {
>> BUG();
>> return IRQ_HANDLED;
>> }
>>
>> and right after we register it:
>> disable_irq(irq); /* make sure it's never delivered */
>>
>> The is no enable -- ignoring bugs of which there have been couple of
>> instances, but those trigger the BUG() so are pretty obvious.
>>
>> Ian.
>>
>>
>
>
>
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #1.2: Type: text/html, Size: 11966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2012-10-17 15:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:10 Xen PVM: Strange lockups when running PostgreSQL load Stefan Bader
2012-10-17 13:28 ` Andrew Cooper
2012-10-17 13:45   ` Stefan Bader
2012-10-17 13:55   ` Ian Campbell
2012-10-17 15:21     ` Stefan Bader
2012-10-17 15:35       ` Andrew Cooper [this message]
2012-10-17 16:27         ` Stefan Bader
2012-10-17 17:46           ` Andrew Cooper
2012-10-18  7:00         ` Jan Beulich
2012-10-18  7:08           ` Jan Beulich
2012-10-18  7:38             ` Stefan Bader
2012-10-18  7:48               ` Ian Campbell
2012-10-18 10:20                 ` Stefan Bader
2012-10-18 10:47                   ` Jan Beulich
2012-10-18 12:43                     ` Stefan Bader
2012-10-18 20:52                       ` Stefan Bader
2012-10-19  7:10                         ` Stefan Bader
2012-10-19  8:06                         ` Jan Beulich
2012-10-19  8:33                           ` Stefan Bader
2012-10-19  9:24                             ` Jan Beulich
2012-10-19 14:03                               ` Stefan Bader
2012-10-19 14:49                                 ` Jan Beulich
2012-10-19 14:57                                   ` Stefan Bader
2012-10-19 15:08                                     ` Jan Beulich
2012-10-19 15:21                                       ` Stefan Bader
2012-10-19 15:33                                         ` Jan Beulich
2012-10-18  7:24           ` Stefan Bader
2012-10-17 14:51   ` Jan Beulich
2012-10-17 15:12     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507ED038.8000806@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=stefan.bader@canonical.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.