Re: VM hung after running sometime

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Keir Fraser <keir.fraser@eu.citrix.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: Re: VM hung after running sometime
Date: Tue, 21 Sep 2010 08:53:33 +0100	[thread overview]
Message-ID: <C8BE230D.239BA%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <BAY121-W311ED332122059D811847DA7F0@phx.gbl>

On 21/09/2010 06:02, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:

> Take a look at domain 0 event channel with port 105,106, I find on port 105,
> it pending is
> 1.(in [1,0], first bit refer to pending, and is 1, second bit refer to mask,
> is 0).
>  
> (XEN)      105 [1/0]: s=3 n=2 d=10 p=1 x=0
> (XEN)      106 [0/0]: s=3 n=2 d=10 p=2 x=0
>  
> In all, we have domain U cpu blocking on _VPF_blocked_in_xen, and it must set
> the pending bit.
> Consider pending is 1, it looks like the irq is not triggered, am I  right ?
> Since if it is triggerred, it should clear the pending bit. (line 361).

Yes it looks like dom0 is not handling the event for some reason. Qemu looks
like it still works and is waiting for a notification via select(). But that
won't happen until dom0 kernel handles the event as an IRQ and calls the
relevant irq handler (drivers/xen/evtchn.c:evtchn_interrupt()).

I think you're on the right track in your debugging. I don't know much about
the pv_ops irq handling path, except to say that this aspect is different
than non-pv_ops kernels which special-case handling of events bound to
user-space rather more. So at the moment my best guess would be that the bug
is in the pv_ops kernel irq handling for this type of user-space-bound
event.

 -- Keir

> ------------------------------/linux-2.6-pvops.git/kernel/irq/chip.c---
> 354 void
> 355 handle_level_irq(unsigned int irq, struct irq_desc *desc)
> 356 {
> 357         struct irqaction *action;
> 358         irqreturn_t action_ret;
> 359 
> 360         spin_lock(&desc->lock);
> 361         mask_ack_irq(desc, irq);
> 362 
> 363         if (unlikely(desc->status & IRQ_INPROGRESS))
> 364                 goto out_unlock;
> 365         desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
> 366         kstat_incr_irqs_this_cpu(irq, desc);
> 367 
>  
> BTW, the qemu still works fine when VM is hang. Below is it strace output.
> No much difference between other well worked qemu instance, other than select
> all Timeout.
> -------------------
> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
> clock_gettime(CLOCK_MONOTONIC, {673470, 59535265}) = 0
> clock_gettime(CLOCK_MONOTONIC, {673470, 59629728}) = 0
> clock_gettime(CLOCK_MONOTONIC, {673470, 59717700}) = 0
> clock_gettime(CLOCK_MONOTONIC, {673470, 59806552}) = 0
> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout)
> clock_gettime(CLOCK_MONOTONIC, {673470, 70234406}) = 0
> clock_gettime(CLOCK_MONOTONIC, {673470, 70332116}) = 0
> clock_gettime(CLOCK_MONOTONIC, {673470, 70419835}) = 0
>  
>  
>  
>  
>> Date: Mon, 20 Sep 2010 10:35:46 +0100
>> Subject: Re: VM hung after running sometime
>> From: keir.fraser@eu.citrix.com
>> To: tinnycloud@hotmail.com
>> CC: xen-devel@lists.xensource.com; jbeulich@novell.com
>> 
>> On 20/09/2010 10:15, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
>> 
>>> Thanks Keir.
>>> 
>>> You're right, after I deeply looked into the wait_on_xen_event_channel, it
>>> is
>>> smart enough
>>> to avoid the race I assumed.
>>> 
>>> How about prepare_wait_on_xen_event_channel ?
>>> Currently Istill don't know when it will be invoked.
>>> Could enlighten me?
>> 
>> As you can see it is called from hvm_send_assist_req(), hence it is called
>> whenever an ioreq is sent to qemu-dm. Note that it is called *before*
>> qemu-dm is notified -- hence it cannot race the wakeup from qemu, as we will
>> not get woken until qemu-dm has done the work, and it cannot start the work
>> until it is notified, and it is not notified until after
>> prepare_wait_on_xen_event_channel has been executed.
>> 
>> -- Keir
>> 
>>> 
>>>> Date: Mon, 20 Sep 2010 08:45:21 +0100
>>>> Subject: Re: VM hung after running sometime
>>>> From: keir.fraser@eu.citrix.com
>>>> To: tinnycloud@hotmail.com
>>>> CC: xen-devel@lists.xensource.com; jbeulich@novell.com
>>>> 
>>>> On 20/09/2010 07:00, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
>>>> 
>>>>> When IO is not ready, domain U in VMEXIT->hvm_do_resume might invoke
>>>>> wait_on_xen_event_channel
>>>>> (where it is blocked in _VPF_blocked_in_xen).
>>>>> 
>>>>> Here is my assumption of event missed.
>>>>> 
>>>>> step 1: hvm_do_resume execute 260, and suppose p->state is
>>>>> STATE_IOREQ_READY
>>>>> or STATE_IOREQ_INPROCESS
>>>>> step 2: then in cpu_handle_ioreq is in line 547, it execute line 548 so
>>>>> quickly before hvm_do_resume execute line 270.
>>>>> Well, the event is missed.
>>>>> In other words, the _VPF_blocked_in_xen is cleared before it is actually
>>>>> setted, and Domian U who is blocked
>>>>> might never get unblocked, it this possible?
>>>> 
>>>> Firstly, that code is very paranoid and it should never actually be the
>>>> case
>>>> that we see STATE_IOREQ_READY or STATE_IOREQ_INPROCESS in hvm_do_resume().
>>>> Secondly, even if you do, take a look at the implementation of
>>>> wait_on_xen_event_channel() -- it is smart enough to avoid the race you
>>>> mention.
>>>> 
>>>> -- Keir
>>>> 
>>>> 
>>> 
>> 
>> 
>

next prev parent reply	other threads:[~2010-09-21  7:53 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BAY121-W45A47AC73BDA1A9E7474A2DA720@phx.gbl>
     [not found] ` <C8ACD97B.1256D%keir.fraser@eu.citrix.com>
2010-09-10 11:01   ` VM hung after running sometime MaoXiaoyun
2010-09-19 10:37     ` MaoXiaoyun
2010-09-19 11:49       ` Keir Fraser
2010-09-19 12:21         ` Zhang, Yang Z
2010-09-20  6:00         ` MaoXiaoyun
2010-09-20  7:45           ` Keir Fraser
2010-09-20  8:23             ` MaoXiaoyun
2010-09-20  9:15             ` MaoXiaoyun
2010-09-20  9:35               ` Keir Fraser
2010-09-21  5:02                 ` MaoXiaoyun
2010-09-21  7:53                   ` Keir Fraser [this message]
2010-09-21  9:24                     ` wei song
2010-09-21  9:49                       ` wei song
2010-09-21 17:28                     ` Jeremy Fitzhardinge
2010-09-22  0:02                       ` MaoXiaoyun
2010-09-22  0:17                         ` Jeremy Fitzhardinge
2010-09-22  1:19                           ` MaoXiaoyun
2010-09-22 18:31                             ` Jeremy Fitzhardinge
2010-09-23  0:55                               ` MaoXiaoyun
2010-09-23 23:20                                 ` Jeremy Fitzhardinge
2010-09-24  4:29                                   ` MaoXiaoyun
2010-09-25  9:33                                   ` MaoXiaoyun
2010-09-25 10:40                                     ` wei song
2010-09-27 18:02                                       ` Jeremy Fitzhardinge
2010-09-27 11:56                                     ` MaoXiaoyun
2010-09-28  5:43                                   ` MaoXiaoyun
2010-09-28 11:23                                     ` MaoXiaoyun
2010-09-28 17:07                                       ` Jeremy Fitzhardinge
2010-09-29  6:01                                         ` MaoXiaoyun
2010-09-29 16:12                                           ` Jeremy Fitzhardinge
2010-10-15 12:43     ` Domain 0 stop response on frequently reboot VMS MaoXiaoyun
2010-10-15 12:57       ` Keir Fraser
2010-10-16  5:39         ` MaoXiaoyun
2010-10-16  7:16           ` Keir Fraser
2010-10-18 21:17           ` Daniel Stodden
2010-10-24  5:48             ` MaoXiaoyun
2010-10-24  5:56               ` Daniel Stodden
2010-10-26  8:16                 ` MaoXiaoyun
2010-10-26  9:09                   ` Daniel Stodden
2010-10-26 10:54                     ` MaoXiaoyun
2010-10-26  9:20                   ` Ian Campbell
2010-10-26 10:59                     ` MaoXiaoyun
2010-10-26 11:54                       ` Domain 0 stop response on frequently reboot VMS, fix xen/master link? Pasi Kärkkäinen
2010-10-26 17:08                         ` Jeremy Fitzhardinge
2010-11-04  3:09               ` A Patch for modify DomU network transmit rate dynamically MaoXiaoyun
2010-11-04  3:43                 ` MaoXiaoyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C8BE230D.239BA%keir.fraser@eu.citrix.com \
    --to=keir.fraser@eu.citrix.com \
    --cc=tinnycloud@hotmail.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).