From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: HVMs terminating as (null) Date: Sat, 23 Nov 2013 20:26:11 +0000 Message-ID: <52910F63.4070705@citrix.com> References: <5290D480.1000804@crc.id.au> <20131123192746.GA21689@aepfle.de> <52910427.5020507@crc.id.au> <52910853.1090708@crc.id.au> <52910A2F.50806@citrix.com> <52910B7F.6030503@crc.id.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2231236130229094471==" Return-path: In-Reply-To: <52910B7F.6030503@crc.id.au> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Steven Haigh Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============2231236130229094471== Content-Type: multipart/alternative; boundary="------------050204020307060302080703" --------------050204020307060302080703 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 23/11/13 20:09, Steven Haigh wrote: > On 24/11/13 07:03, Andrew Cooper wrote: >> On 23/11/13 19:56, Steven Haigh wrote: >>> On 24/11/13 06:38, Steven Haigh wrote: >>>> On 24/11/13 06:27, Olaf Hering wrote: >>>>> On Sun, Nov 24, Steven Haigh wrote: >>>>> >>>>>> Running Xen 4.2.3 with all the current XSA fixes. >>>>> >>>>> How exactly did you start the guests? >>>> >>>> The DomUs were started with: xl create /etc/xen/ >>>> >>>>> Does 'ps faxu' show qemu processes for the listed domain_ids? >>>>> What is the 'xenstore-ls -f | sort' output? >>>> >>>> I'll have to check this when I manage to reproduce it. So far, I have >>>> been unable to get a reliable way to reproduce it. I managed to get a >>>> system to do it every time a HVM DomU was shutdown OR restarted - but >>>> after a reboot of the Dom0 I can't get it into that state again. >>>> >>>> As soon as I can get a system in this state again, I'll leave it to see >>>> what information I can extract. >>> >>> Ha! As always, as soon as I send this, I notice its happened on a Dom0. >>> >>> # xl list >>> Name ID Mem VCPUs State >>> Time(s) >>> Domain-0 0 1579 2 r----- >>> 2731.3 >>> planner.vm 1 1013 1 -b---- >>> 189.3 >>> (null) 2 0 1 --psrd >>> 301.1 >>> tracker.vm 3 1013 2 -b---- >>> 834.4 >>> >>> Attached is the output of: >>> # xl debug-keys q >>> # xl dmesg > xen-dmesg.log >>> # gzip xen-dmesg.log >> >> Ok - from dmesg. >> >> (XEN) General information for domain 2: >> (XEN) refcnt=1 dying=2 pause_count=2 >> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 >> dirty_cpus={} max_pages=262400 >> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 >> (XEN) paging assistance: hap refcounts translate external >> ... >> (XEN) Memory pages belonging to domain 2: >> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 >> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 >> (XEN) PoD entries=0 cachesize=0 >> >> >> So there are indeed two outstanding pages causing this domain to become >> a zombie. They are normal pages, with 1 outstanding ref. >> >> Can you collect "xl debug-keys g" as well? > > Sure - attached. (XEN) -------- active -------- -------- shared -------- (XEN) [ref] localdom mfn pin localdom gmfn flags (XEN) grant-table for remote domain: 2 (v1) (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19 (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19 Ok - so domain 2 has two outstanding grants. This explains why it is a zombie. Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but seemingly unmapped. I will have to defer to someone who knows the grant code better. Is it possible for a domain to be a zombie just because it has two grants it hasn't manually invalidated? ~Andrew --------------050204020307060302080703 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 23/11/13 20:09, Steven Haigh wrote:
> On 24/11/13 07:03, Andrew Cooper wrote:
>> On 23/11/13 19:56, Steven Haigh wrote:
>>> On 24/11/13 06:38, Steven Haigh wrote:
>>>> On 24/11/13 06:27, Olaf Hering wrote:
>>>>> On Sun, Nov 24, Steven Haigh wrote:
>>>>>
>>>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>>>
>>>>> How exactly did you start the guests?
>>>>
>>>> The DomUs were started with: xl create /etc/xen/<configfile>
>>>>
>>>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>>>> What is the 'xenstore-ls -f | sort' output?
>>>>
>>>> I'll have to check this when I manage to reproduce it. So far, I have
>>>> been unable to get a reliable way to reproduce it. I managed to get a
>>>> system to do it every time a HVM DomU was shutdown OR restarted - but
>>>> after a reboot of the Dom0 I can't get it into that state again.
>>>>
>>>> As soon as I can get a system in this state again, I'll leave it to see
>>>> what information I can extract.
>>>
>>> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>>>
>>> # xl list
>>> Name                                        ID   Mem VCPUs      State
>>> Time(s)
>>> Domain-0                                     0  1579     2     r-----
>>>  2731.3
>>> planner.vm                                   1  1013     1     -b----
>>>   189.3
>>> (null)                                       2     0     1     --psrd
>>>   301.1
>>> tracker.vm                                   3  1013     2     -b----
>>>   834.4
>>>
>>> Attached is the output of:
>>> # xl debug-keys q
>>> # xl dmesg  > xen-dmesg.log
>>> # gzip xen-dmesg.log
>>
>> Ok - from dmesg.
>>
>> (XEN) General information for domain 2:
>> (XEN)     refcnt=1 dying=2 pause_count=2
>> (XEN)     nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0
>> dirty_cpus={} max_pages=262400
>> (XEN)     handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
>> (XEN)     paging assistance: hap refcounts translate external
>> ...
>> (XEN) Memory pages belonging to domain 2:
>> (XEN)     DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
>> (XEN)     DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
>> (XEN)     PoD entries=0 cachesize=0
>>
>>
>> So there are indeed two outstanding pages causing this domain to become
>> a zombie.  They are normal pages, with 1 outstanding ref.
>>
>> Can you collect "xl debug-keys g" as well?
>
> Sure - attached.


(XEN)       -------- active --------       -------- shared --------
(XEN) [ref] localdom mfn      pin          localdom gmfn     flags
(XEN) grant-table for remote domain:    2 (v1)
(XEN) [16302]        0 0x0866e1 0x00000001          0 0x0064e1 0x19
(XEN) [16320]        0 0x0866e0 0x00000001          0 0x0064e0 0x19

Ok - so domain 2 has two outstanding grants.  This explains why it is a zombie.

Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but seemingly unmapped.

I will have to defer to someone who knows the grant code better.  Is it possible for a domain to be a zombie just because it has two grants it hasn't manually invalidated?

~Andrew

--------------050204020307060302080703-- --===============2231236130229094471== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============2231236130229094471==--