From: Keir Fraser <keir.fraser@eu.citrix.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: Re: VM hung after running sometime
Date: Sun, 19 Sep 2010 12:49:44 +0100 [thread overview]
Message-ID: <C8BBB768.2379E%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <BAY121-W13F714A682D65CDD4B5C74DA7D0@phx.gbl>
On 19/09/2010 11:37, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
> Hi Keir:
>
> Regards to HVM hang , according to our recent test, it turns out this
> issue still exists.
> When I go through the code, I obseved something abnormal and need your
> help.
>
> We've noticed when VM hang, its VCPU flags is always 4, which indicates
> _VPF_blocked_in_xen,
> and it is invoked in prepare_wait_on_xen_event_channel. I've noticed
> that Domain U has setup
> a event channel with domain 0 for each VCPU and qemu-dm select on the
> event fd.
>
> notify_via_xen_event_channel is called when Domain U issue a request.
> And in qemu-dm it will
> get the event, and invoke
> cpu_handle_ioreq(/xen-4.0.0/tools/ioemu-qemu-xen/i386-dm/helper2.c)
> ->cpu_get_ioreq()->xc_evtchn_unmask(). In evtchn_unmask it will has
> operation on evtchn_pending,
> evtchn_mask, or evtchn_pending_sel.
>
> My confusion is on notify_via_xen_event_channel()->evtchn_set_pending,
> the **evtchn_set_pending here
> in not locked**, while inside it also have operation on evtchn_pending,
> evtchn_mask, or evtchn_pending_sel.
Atomic ops are used to make the operations on evtchn_pending, evtchn_mask,
and evtchn_sel concurrency safe. Note that the locking from
notify_via_xen_event_channel() is just the same as, say, from evtchn_send():
the local domain's (ie. DomU's, in this case) event_lock is held, while the
remote domain's (ie. dom0's, in this case) does not need to be held.
If your domU is stuck in state _VPF_blocked_in_xen, it probably means
qemu-dm is toast. I would investigate whether the qemu-dm process is still
present, still doing useful work, etc etc.
-- Keir
> I'm afried this access competition might cause event undeliverd from dom
> U to qemu-dm, but I am not sure,
> since I still not fully understand where event_mask and is set, and
> where event_pending is cleared.
>
> -------------------------notify_via_xen_event_channel-------------------------
> ------------
> 989 void notify_via_xen_event_channel(int lport)
> 990 {
> 991 struct evtchn *lchn, *rchn;
> 992 struct domain *ld = current->domain, *rd;
> 993 int rport;
> 994
> 995 spin_lock(&ld->event_lock);
> 996
> 997 ASSERT(port_is_valid(ld, lport));
> 998 lchn = evtchn_from_port(ld, lport);
> 999 ASSERT(lchn->consumer_is_xen);
> 1000
> 1001 if ( likely(lchn->state == ECS_INTERDOMAIN) )
> 1002 {
> 1003 rd = lchn->u.interdomain.remote_dom;
> 1004 rport = lchn->u.interdomain.remote_port;
> 1005 rchn = evtchn_from_port(rd, rport);
> 1006 evtchn_set_pending(rd->vcpu[rchn->notify_vcpu_id], rport);
> 1007 }
> 1008
> 1009 spin_unlock(&ld->event_lock);
> 1010 }
>
> ----------------------------evtchn_set_pending----------------------
> 535 static int evtchn_set_pending(struct vcpu *v, int port)
> 536 {
> 537 struct domain *d = v->domain;
> 538 int vcpuid;
> 539
> 540 /*
> 541 * The following bit operations must happen in strict order.
> 542 * NB. On x86, the atomic bit operations also act as memory barriers.
> 543 * There is therefore sufficiently strict ordering for this
> architecture --
> 544 * others may require explicit memory barriers.
> 545 */
> 546
> 547 if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
> 548 return 1;
> 549
> 550 if ( !test_bit (port, &shared_info(d, evtchn_mask)) &&
> 551 !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
> 552 &vcpu_info(v, evtchn_pending_sel)) )
> 553 {
> 554 vcpu_mark_events_pending(v);
> 555 }
> 556
> 557 /* Check if some VCPU might be polling for this event. */
> 558 if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
> 559 return 0;
> 560
> 561 /* Wake any interested (or potentially interested) pollers. */
> 562 for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
> 563 vcpuid < d->max_vcpus;
> 564 vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
> 565 {
> 566 v = d->vcpu[vcpuid];
> 567 if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
> 568 test_and_clear_bit(vcpuid, d->poll_mask) )
> 569 {
> 570 v->poll_evtchn = 0;
> 571 vcpu_unblock(v);
>
> --------------------------------------evtchn_unmask---------------------------
> ---
> 764
> 765 int evtchn_unmask(unsigned int port)
> 766 {
> 767 struct domain *d = current->domain;
> 768 struct vcpu *v;
> 769
> 770 spin_lock(&d->event_lock);
> 771
> 772 if ( unlikely(!port_is_valid(d, port)) )
> 773 {
> 774 spin_unlock(&d->event_lock);
> 775 return -EINVAL;
> 776 }
> 777
> 778 v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id];
> 779
> 780 /*
> 781 * These operations must happen in strict order. Based on
> 782 * include/xen/event.h:evtchn_set_pending().
> 783 */
> 784 if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
> 785 test_bit (port, &shared_info(d, evtchn_pending)) &&
> 786 !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
> 787 &vcpu_info(v, evtchn_pending_sel)) )
> 788 {
> 789 vcpu_mark_events_pending(v);
> 790 }
> 791
> 792 spin_unlock(&d->event_lock);
> 793
> 794 return 0;
> 795 }
> ----------------------------cpu_get_ioreq-------------------------
> 260 static ioreq_t *cpu_get_ioreq(void)
> 261 {
> 262 int i;
> 263 evtchn_port_t port;
> 264
> 265 port = xc_evtchn_pending(xce_handle);
> 266 if (port != -1) {
> 267 for ( i = 0; i < vcpus; i++ )
> 268 if ( ioreq_local_port[i] == port )
> 269 break;
> 270
> 271 if ( i == vcpus ) {
> 272 fprintf(logfile, "Fatal error while trying to get io
> event!\n");
> 273 exit(1);
> 274 }
> 275
> 276 // unmask the wanted port again
> 277 xc_evtchn_unmask(xce_handle, port);
> 278
> 279 //get the io packet from shared memory
> 280 send_vcpu = i;
> 281 return __cpu_get_ioreq(i);
> 282 }
> 283
> 284 //read error or read nothing
> 285 return NULL;
> 286 }
> 287
>
>
next prev parent reply other threads:[~2010-09-19 11:49 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BAY121-W45A47AC73BDA1A9E7474A2DA720@phx.gbl>
[not found] ` <C8ACD97B.1256D%keir.fraser@eu.citrix.com>
2010-09-10 11:01 ` VM hung after running sometime MaoXiaoyun
2010-09-19 10:37 ` MaoXiaoyun
2010-09-19 11:49 ` Keir Fraser [this message]
2010-09-19 12:21 ` Zhang, Yang Z
2010-09-20 6:00 ` MaoXiaoyun
2010-09-20 7:45 ` Keir Fraser
2010-09-20 8:23 ` MaoXiaoyun
2010-09-20 9:15 ` MaoXiaoyun
2010-09-20 9:35 ` Keir Fraser
2010-09-21 5:02 ` MaoXiaoyun
2010-09-21 7:53 ` Keir Fraser
2010-09-21 9:24 ` wei song
2010-09-21 9:49 ` wei song
2010-09-21 17:28 ` Jeremy Fitzhardinge
2010-09-22 0:02 ` MaoXiaoyun
2010-09-22 0:17 ` Jeremy Fitzhardinge
2010-09-22 1:19 ` MaoXiaoyun
2010-09-22 18:31 ` Jeremy Fitzhardinge
2010-09-23 0:55 ` MaoXiaoyun
2010-09-23 23:20 ` Jeremy Fitzhardinge
2010-09-24 4:29 ` MaoXiaoyun
2010-09-25 9:33 ` MaoXiaoyun
2010-09-25 10:40 ` wei song
2010-09-27 18:02 ` Jeremy Fitzhardinge
2010-09-27 11:56 ` MaoXiaoyun
2010-09-28 5:43 ` MaoXiaoyun
2010-09-28 11:23 ` MaoXiaoyun
2010-09-28 17:07 ` Jeremy Fitzhardinge
2010-09-29 6:01 ` MaoXiaoyun
2010-09-29 16:12 ` Jeremy Fitzhardinge
2010-10-15 12:43 ` Domain 0 stop response on frequently reboot VMS MaoXiaoyun
2010-10-15 12:57 ` Keir Fraser
2010-10-16 5:39 ` MaoXiaoyun
2010-10-16 7:16 ` Keir Fraser
2010-10-18 21:17 ` Daniel Stodden
2010-10-24 5:48 ` MaoXiaoyun
2010-10-24 5:56 ` Daniel Stodden
2010-10-26 8:16 ` MaoXiaoyun
2010-10-26 9:09 ` Daniel Stodden
2010-10-26 10:54 ` MaoXiaoyun
2010-10-26 9:20 ` Ian Campbell
2010-10-26 10:59 ` MaoXiaoyun
2010-10-26 11:54 ` Domain 0 stop response on frequently reboot VMS, fix xen/master link? Pasi Kärkkäinen
2010-10-26 17:08 ` Jeremy Fitzhardinge
2010-11-04 3:09 ` A Patch for modify DomU network transmit rate dynamically MaoXiaoyun
2010-11-04 3:43 ` MaoXiaoyun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C8BBB768.2379E%keir.fraser@eu.citrix.com \
--to=keir.fraser@eu.citrix.com \
--cc=tinnycloud@hotmail.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).