Re: VM hung after running sometime

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Keir Fraser <keir.fraser@eu.citrix.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: Re: VM hung after running sometime
Date: Sun, 19 Sep 2010 12:49:44 +0100	[thread overview]
Message-ID: <C8BBB768.2379E%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <BAY121-W13F714A682D65CDD4B5C74DA7D0@phx.gbl>

On 19/09/2010 11:37, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:

> Hi Keir:
>  
>        Regards to HVM hang , according to our recent test, it turns out this
> issue still exists.
>        When I go through the code, I obseved something abnormal and need your
> help.
>  
>       We've noticed when VM hang, its VCPU flags is always 4, which indicates
> _VPF_blocked_in_xen,
>       and it is invoked in prepare_wait_on_xen_event_channel. I've noticed
> that Domain U has setup
>       a event channel  with domain 0 for each VCPU and qemu-dm select on the
> event fd.  
>  
>       notify_via_xen_event_channel is called when Domain U issue a request.
> And in qemu-dm it will
>       get the event,  and invoke
> cpu_handle_ioreq(/xen-4.0.0/tools/ioemu-qemu-xen/i386-dm/helper2.c)
>      ->cpu_get_ioreq()->xc_evtchn_unmask(). In evtchn_unmask it will has
> operation on evtchn_pending,
>       evtchn_mask, or evtchn_pending_sel.
>  
>       My confusion is on notify_via_xen_event_channel()->evtchn_set_pending,
> the **evtchn_set_pending here
>       in not locked**, while inside it also have operation on evtchn_pending,
> evtchn_mask, or evtchn_pending_sel.

Atomic ops are used to make the operations on evtchn_pending, evtchn_mask,
and evtchn_sel concurrency safe. Note that the locking from
notify_via_xen_event_channel() is just the same as, say, from evtchn_send():
the local domain's (ie. DomU's, in this case) event_lock is held, while the
remote domain's (ie. dom0's, in this case) does not need to be held.

If your domU is stuck in state _VPF_blocked_in_xen, it probably means
qemu-dm is toast. I would investigate whether the qemu-dm process is still
present, still doing useful work, etc etc.

 -- Keir

>       I'm afried this access competition might cause event undeliverd from dom
> U to qemu-dm, but I am not sure,
>      since  I still not fully understand where event_mask and is set, and
> where event_pending is cleared.
>  
> -------------------------notify_via_xen_event_channel-------------------------
> ------------
>  989 void notify_via_xen_event_channel(int lport)
>  990 {
>  991     struct evtchn *lchn, *rchn;
>  992     struct domain *ld = current->domain, *rd;
>  993     int            rport;
>  994 
>  995     spin_lock(&ld->event_lock);
>  996 
>  997     ASSERT(port_is_valid(ld, lport));
>  998     lchn = evtchn_from_port(ld, lport);
>  999     ASSERT(lchn->consumer_is_xen);
> 1000 
> 1001     if ( likely(lchn->state == ECS_INTERDOMAIN) )
> 1002     {
> 1003         rd    = lchn->u.interdomain.remote_dom;
> 1004         rport = lchn->u.interdomain.remote_port;
> 1005         rchn  = evtchn_from_port(rd, rport);
> 1006         evtchn_set_pending(rd->vcpu[rchn->notify_vcpu_id], rport);
> 1007     }
> 1008 
> 1009     spin_unlock(&ld->event_lock);
> 1010 }
>       
> ----------------------------evtchn_set_pending----------------------
> 535 static int evtchn_set_pending(struct vcpu *v, int port)
>  536 {
>  537     struct domain *d = v->domain;
>  538     int vcpuid;
>  539 
>  540     /*
>  541      * The following bit operations must happen in strict order.
>  542      * NB. On x86, the atomic bit operations also act as memory barriers.
>  543      * There is therefore sufficiently strict ordering for this
> architecture --
>  544      * others may require explicit memory barriers.
>  545      */
>  546 
>  547     if ( test_and_set_bit(port, &shared_info(d, evtchn_pending)) )
>  548         return 1;
>  549 
>  550     if ( !test_bit        (port, &shared_info(d, evtchn_mask)) &&
>  551          !test_and_set_bit(port / BITS_PER_EVTCHN_WORD(d),
>  552                            &vcpu_info(v, evtchn_pending_sel)) )
>  553     {
>  554         vcpu_mark_events_pending(v);
>  555     }
>  556 
>  557     /* Check if some VCPU might be polling for this event. */
>  558     if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
>  559         return 0;
>  560 
>  561     /* Wake any interested (or potentially interested) pollers. */
>  562     for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
>  563           vcpuid < d->max_vcpus;
>  564           vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
>  565     {
>  566         v = d->vcpu[vcpuid];
>  567         if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
>  568              test_and_clear_bit(vcpuid, d->poll_mask) )
>  569         {
>  570             v->poll_evtchn = 0;
>  571             vcpu_unblock(v);
>    
> --------------------------------------evtchn_unmask---------------------------
> ---
>  764 
>  765 int evtchn_unmask(unsigned int port)
>  766 {
>  767     struct domain *d = current->domain;
>  768     struct vcpu   *v;
>  769 
>  770     spin_lock(&d->event_lock);
>  771 
>  772     if ( unlikely(!port_is_valid(d, port)) )
>  773     {
>  774         spin_unlock(&d->event_lock);
>  775         return -EINVAL;
>  776     }
>  777 
>  778     v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id];
>  779 
>  780     /*
>  781      * These operations must happen in strict order. Based on
>  782      * include/xen/event.h:evtchn_set_pending().
>  783      */
>  784     if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
>  785          test_bit          (port, &shared_info(d, evtchn_pending)) &&
>  786          !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
>  787                             &vcpu_info(v, evtchn_pending_sel)) )
>  788     {
>  789         vcpu_mark_events_pending(v);
>  790     }
>  791 
>  792     spin_unlock(&d->event_lock);
>  793 
>  794     return 0;
>  795 }           
>  ----------------------------cpu_get_ioreq-------------------------
> 260 static ioreq_t *cpu_get_ioreq(void)
> 261 {
> 262     int i;
> 263     evtchn_port_t port;
> 264 
> 265     port = xc_evtchn_pending(xce_handle);
> 266     if (port != -1) {
> 267         for ( i = 0; i < vcpus; i++ )
> 268             if ( ioreq_local_port[i] == port )
> 269                 break;
> 270 
> 271         if ( i == vcpus ) {
> 272             fprintf(logfile, "Fatal error while trying to get io
> event!\n");
> 273             exit(1);
> 274         }
> 275 
> 276         // unmask the wanted port again
> 277         xc_evtchn_unmask(xce_handle, port);
> 278 
> 279         //get the io packet from shared memory
> 280         send_vcpu = i;
> 281         return __cpu_get_ioreq(i);
> 282     }
> 283 
> 284     //read error or read nothing
> 285     return NULL;
> 286 }
> 287 
>        
>

next prev parent reply	other threads:[~2010-09-19 11:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BAY121-W45A47AC73BDA1A9E7474A2DA720@phx.gbl>
     [not found] ` <C8ACD97B.1256D%keir.fraser@eu.citrix.com>
2010-09-10 11:01   ` VM hung after running sometime MaoXiaoyun
2010-09-19 10:37     ` MaoXiaoyun
2010-09-19 11:49       ` Keir Fraser [this message]
2010-09-19 12:21         ` Zhang, Yang Z
2010-09-20  6:00         ` MaoXiaoyun
2010-09-20  7:45           ` Keir Fraser
2010-09-20  8:23             ` MaoXiaoyun
2010-09-20  9:15             ` MaoXiaoyun
2010-09-20  9:35               ` Keir Fraser
2010-09-21  5:02                 ` MaoXiaoyun
2010-09-21  7:53                   ` Keir Fraser
2010-09-21  9:24                     ` wei song
2010-09-21  9:49                       ` wei song
2010-09-21 17:28                     ` Jeremy Fitzhardinge
2010-09-22  0:02                       ` MaoXiaoyun
2010-09-22  0:17                         ` Jeremy Fitzhardinge
2010-09-22  1:19                           ` MaoXiaoyun
2010-09-22 18:31                             ` Jeremy Fitzhardinge
2010-09-23  0:55                               ` MaoXiaoyun
2010-09-23 23:20                                 ` Jeremy Fitzhardinge
2010-09-24  4:29                                   ` MaoXiaoyun
2010-09-25  9:33                                   ` MaoXiaoyun
2010-09-25 10:40                                     ` wei song
2010-09-27 18:02                                       ` Jeremy Fitzhardinge
2010-09-27 11:56                                     ` MaoXiaoyun
2010-09-28  5:43                                   ` MaoXiaoyun
2010-09-28 11:23                                     ` MaoXiaoyun
2010-09-28 17:07                                       ` Jeremy Fitzhardinge
2010-09-29  6:01                                         ` MaoXiaoyun
2010-09-29 16:12                                           ` Jeremy Fitzhardinge
2010-10-15 12:43     ` Domain 0 stop response on frequently reboot VMS MaoXiaoyun
2010-10-15 12:57       ` Keir Fraser
2010-10-16  5:39         ` MaoXiaoyun
2010-10-16  7:16           ` Keir Fraser
2010-10-18 21:17           ` Daniel Stodden
2010-10-24  5:48             ` MaoXiaoyun
2010-10-24  5:56               ` Daniel Stodden
2010-10-26  8:16                 ` MaoXiaoyun
2010-10-26  9:09                   ` Daniel Stodden
2010-10-26 10:54                     ` MaoXiaoyun
2010-10-26  9:20                   ` Ian Campbell
2010-10-26 10:59                     ` MaoXiaoyun
2010-10-26 11:54                       ` Domain 0 stop response on frequently reboot VMS, fix xen/master link? Pasi Kärkkäinen
2010-10-26 17:08                         ` Jeremy Fitzhardinge
2010-11-04  3:09               ` A Patch for modify DomU network transmit rate dynamically MaoXiaoyun
2010-11-04  3:43                 ` MaoXiaoyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C8BBB768.2379E%keir.fraser@eu.citrix.com \
    --to=keir.fraser@eu.citrix.com \
    --cc=tinnycloud@hotmail.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).