From: Mukesh Rathor <mukesh.rathor@oracle.com>
To: "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
Jan Beulich <jbeulich@novell.com>
Subject: DOM0 Hang on a large box....
Date: Thu, 1 Sep 2011 12:20:04 -0700 [thread overview]
Message-ID: <20110901122004.2c12f34f@mantra.us.oracle.com> (raw)
Hi,
I'm looking at a system hang on a large box: 160 cpus, 2TB. Dom0 is
booted with 160 vcpus (don't ask me why :)), and an HVM guest is started
with over 1.5T RAM and 128 vcpus. The system hangs without much activity
after couple hours. Xen 4.0.2 and 2.6.32 based 64bit dom0.
During hang I discovered:
Most of dom0 vcpus are in double_lock_balance spinning on one of the locks:
@ ffffffff800083aa: 0:hypercall_page+3aa pop %r11
@ ffffffff802405eb: 0:xen_spin_wait+19b test %eax, %eax
@ ffffffff8035969b: 0:_spin_lock+10b test %al, %al
@ ffffffff800342f5: 0:double_lock_balance+65 mov %rbx, %rdi
@ ffffffff80356fc0: 0:thread_return+37e mov 0x880(%r12), %edi
static int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
__releases(this_rq->lock)
__acquires(busiest->lock)
__acquires(this_rq->lock)
{
int ret = 0;
if (unlikely(!spin_trylock(&busiest->lock))) {
if (busiest < this_rq) {
spin_unlock(&this_rq->lock);
spin_lock(&busiest->lock);
spin_lock_nested(&this_rq->lock, SINGLE_DEPTH_NESTING);
ret = 1;
} else
spin_lock_nested(&busiest->lock, SINGLE_DEPTH_NESTING);
}
return ret;
}
The lock is taken, but not sure who the owner is. The lock struct:
@ ffff8800020e2480: 2f102e70 0000000c 00000002 00000000
so slock is: 2f102e70
The remaining vcpus are idling:
ffffffff800083aa: 0:hypercall_page+3aa pop %r11
ffffffff8000f0c7: 0:xen_safe_halt+f7 addq $0x18, %rsp
ffffffff8000a5c5: 0:cpu_idle+65 jmp 0:cpu_idle+4e
ffffffff803558fe: 0:cpu_bringup_and_idle+e leave
But the baffling thing is the vcpu upcall mask is set. The block schedop call
does local_event_delivery_enable() first thing, so the mask should be clear!!!
Another baffling thing is the dom0 upcall mask looks fishy:
@ ffff83007f4dba00: 4924924924924929 2492492492492492
@ ffff83007f4dba10: 9249249249249249 4924924924924924
@ ffff83007f4dba20: 2492492492492492 9249249249249249
@ ffff83007f4dba30: 4924924924924924 0000000092492492
@ ffff83007f4dba40: 0000000000000000 0000000000000000
@ ffff83007f4dba50: 0000000000000000 ffffffffc0000000
@ ffff83007f4dba60: ffffffffffffffff ffffffffffffffff
@ ffff83007f4dba70: ffffffffffffffff ffffffffffffffff
Finally, ticketing is used for spin locks. Hi Jan, what is the largest
system this was tested on? Have you seen this before?
thanks,
Mukesh
next reply other threads:[~2011-09-01 19:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-01 19:20 Mukesh Rathor [this message]
2011-09-05 12:26 ` DOM0 Hang on a large box Jan Beulich
2011-09-08 1:24 ` Mukesh Rathor
2011-09-08 8:41 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110901122004.2c12f34f@mantra.us.oracle.com \
--to=mukesh.rathor@oracle.com \
--cc=Xen-devel@lists.xensource.com \
--cc=jbeulich@novell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).