From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: HVMs terminating as (null) Date: Sat, 23 Nov 2013 20:03:59 +0000 Message-ID: <52910A2F.50806@citrix.com> References: <5290D480.1000804@crc.id.au> <20131123192746.GA21689@aepfle.de> <52910427.5020507@crc.id.au> <52910853.1090708@crc.id.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4105887537215180608==" Return-path: In-Reply-To: <52910853.1090708@crc.id.au> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Steven Haigh Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============4105887537215180608== Content-Type: multipart/alternative; boundary="------------060901040905090208080401" --------------060901040905090208080401 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 23/11/13 19:56, Steven Haigh wrote: > On 24/11/13 06:38, Steven Haigh wrote: >> On 24/11/13 06:27, Olaf Hering wrote: >>> On Sun, Nov 24, Steven Haigh wrote: >>> >>>> Running Xen 4.2.3 with all the current XSA fixes. >>> >>> How exactly did you start the guests? >> >> The DomUs were started with: xl create /etc/xen/ >> >>> Does 'ps faxu' show qemu processes for the listed domain_ids? >>> What is the 'xenstore-ls -f | sort' output? >> >> I'll have to check this when I manage to reproduce it. So far, I have >> been unable to get a reliable way to reproduce it. I managed to get a >> system to do it every time a HVM DomU was shutdown OR restarted - but >> after a reboot of the Dom0 I can't get it into that state again. >> >> As soon as I can get a system in this state again, I'll leave it to see >> what information I can extract. > > Ha! As always, as soon as I send this, I notice its happened on a Dom0. > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 1579 2 r----- > 2731.3 > planner.vm 1 1013 1 -b---- > 189.3 > (null) 2 0 1 --psrd > 301.1 > tracker.vm 3 1013 2 -b---- > 834.4 > > Attached is the output of: > # xl debug-keys q > # xl dmesg > xen-dmesg.log > # gzip xen-dmesg.log Ok - from dmesg. (XEN) General information for domain 2: (XEN) refcnt=1 dying=2 pause_count=2 (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 dirty_cpus={} max_pages=262400 (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 (XEN) paging assistance: hap refcounts translate external ... (XEN) Memory pages belonging to domain 2: (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 (XEN) PoD entries=0 cachesize=0 So there are indeed two outstanding pages causing this domain to become a zombie. They are normal pages, with 1 outstanding ref. Can you collect "xl debug-keys g" as well? ~Andrew --------------060901040905090208080401 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 23/11/13 19:56, Steven Haigh wrote:
> On 24/11/13 06:38, Steven Haigh wrote:
>> On 24/11/13 06:27, Olaf Hering wrote:
>>> On Sun, Nov 24, Steven Haigh wrote:
>>>
>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>
>>> How exactly did you start the guests?
>>
>> The DomUs were started with: xl create /etc/xen/<configfile>
>>
>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>> What is the 'xenstore-ls -f | sort' output?
>>
>> I'll have to check this when I manage to reproduce it. So far, I have
>> been unable to get a reliable way to reproduce it. I managed to get a
>> system to do it every time a HVM DomU was shutdown OR restarted - but
>> after a reboot of the Dom0 I can't get it into that state again.
>>
>> As soon as I can get a system in this state again, I'll leave it to see
>> what information I can extract.
>
> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>
> # xl list
> Name                                        ID   Mem VCPUs      State
> Time(s)
> Domain-0                                     0  1579     2     r-----
>  2731.3
> planner.vm                                   1  1013     1     -b----
>   189.3
> (null)                                       2     0     1     --psrd
>   301.1
> tracker.vm                                   3  1013     2     -b----
>   834.4
>
> Attached is the output of:
> # xl debug-keys q
> # xl dmesg  > xen-dmesg.log
> # gzip xen-dmesg.log


Ok - from dmesg.

(XEN) General information for domain 2:
(XEN)     refcnt=1 dying=2 pause_count=2
(XEN)     nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 dirty_cpus={} max_pages=262400
(XEN)     handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
(XEN)     paging assistance: hap refcounts translate external
...
(XEN) Memory pages belonging to domain 2:
(XEN)     DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
(XEN)     DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
(XEN)     PoD entries=0 cachesize=0


So there are indeed two outstanding pages causing this domain to become a zombie.  They are normal pages, with 1 outstanding ref.

Can you collect "xl debug-keys g" as well?

~Andrew


--------------060901040905090208080401-- --===============4105887537215180608== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============4105887537215180608==--