* HVMs terminating as (null)
@ 2013-11-23 16:14 Steven Haigh
2013-11-23 16:18 ` Andrew Cooper
2013-11-23 19:27 ` Olaf Hering
0 siblings, 2 replies; 19+ messages in thread
From: Steven Haigh @ 2013-11-23 16:14 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1608 bytes --]
Hi all,
Running Xen 4.2.3 with all the current XSA fixes.
Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a
(null) state - which I can't seem to kill / destroy / clean up from.
# xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 1579 2 r-----
322927.2
(null) 1 0 1 --psrd
14075.9
(null) 2 0 1 --psrd
58467.6
(null) 3 0 0 --ps-d
11604.8
(null) 4 0 2 --p--d
24186.1
(null) 5 0 2 --ps-d
22831.0
The config is very simple:
# cat /etc/xen/remotedesktop.vm
name = "remotedesktop.vm"
memory = 1536
vcpus = 2
cpus = "1-3"
cpu_weight = 128
disk = [ 'phy:/dev/vg_raid1/remotedesktop.vm,hda,w' ,
'file:/root/win7x86.iso,hdc:cdrom,r' ]
vif = [ 'mac=98:95:00:07:07:07, bridge=br203, vifname=vm.rdp' ]
builder = "hvm"
usbdevice = "tablet"
vnc = 1
vnclisten = "10.1.1.1"
vncdisplay = 1 # port 5901
vncpasswd = ''
localtime = 1
viridian = 1
xen_platform_pci= 1
It seems that this happens no matter what I do.
Has anyone come across this before?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 834 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 16:14 HVMs terminating as (null) Steven Haigh
@ 2013-11-23 16:18 ` Andrew Cooper
2013-11-23 16:38 ` Sander Eikelenboom
2013-11-23 19:27 ` Olaf Hering
1 sibling, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2013-11-23 16:18 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1765 bytes --]
On 23/11/13 16:14, Steven Haigh wrote:
> Hi all,
>
> Running Xen 4.2.3 with all the current XSA fixes.
>
> Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a
> (null) state - which I can't seem to kill / destroy / clean up from.
>
> # xl list
> Name ID Mem VCPUs State
> Time(s)
> Domain-0 0 1579 2 r-----
> 322927.2
> (null) 1 0 1 --psrd
> 14075.9
> (null) 2 0 1 --psrd
> 58467.6
> (null) 3 0 0 --ps-d
> 11604.8
> (null) 4 0 2 --p--d
> 24186.1
> (null) 5 0 2 --ps-d
> 22831.0
>
> The config is very simple:
> # cat /etc/xen/remotedesktop.vm
> name = "remotedesktop.vm"
> memory = 1536
> vcpus = 2
> cpus = "1-3"
> cpu_weight = 128
> disk = [ 'phy:/dev/vg_raid1/remotedesktop.vm,hda,w' ,
> 'file:/root/win7x86.iso,hdc:cdrom,r' ]
> vif = [ 'mac=98:95:00:07:07:07, bridge=br203,
vifname=vm.rdp' ]
> builder = "hvm"
> usbdevice = "tablet"
> vnc = 1
> vnclisten = "10.1.1.1"
> vncdisplay = 1 # port 5901
> vncpasswd = ''
> localtime = 1
> viridian = 1
> xen_platform_pci= 1
>
> It seems that this happens no matter what I do.
>
> Has anyone come across this before?
When you have a system in this state, can you run
xl debug-keys q
xl dmesg > xen-dmesg.log
And provide the log file.
Most likely, there will be one unfreed page keeping the domain around as
a zombie.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 5160 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 16:18 ` Andrew Cooper
@ 2013-11-23 16:38 ` Sander Eikelenboom
2013-11-25 10:36 ` Ian Campbell
0 siblings, 1 reply; 19+ messages in thread
From: Sander Eikelenboom @ 2013-11-23 16:38 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, Steven Haigh
Saturday, November 23, 2013, 5:18:43 PM, you wrote:
> On 23/11/13 16:14, Steven Haigh wrote:
>> Hi all,
>>
>> Running Xen 4.2.3 with all the current XSA fixes.
>>
>> Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a
>> (null) state - which I can't seem to kill / destroy / clean up from.
>>
>> # xl list
>> Name ID Mem VCPUs State
>> Time(s)
>> Domain-0 0 1579 2 r-----
>> 322927.2
>> (null) 1 0 1 --psrd
>> 14075.9
>> (null) 2 0 1 --psrd
>> 58467.6
>> (null) 3 0 0 --ps-d
>> 11604.8
>> (null) 4 0 2 --p--d
>> 24186.1
>> (null) 5 0 2 --ps-d
>> 22831.0
>>
>> The config is very simple:
>> # cat /etc/xen/remotedesktop.vm
>> name = "remotedesktop.vm"
>> memory = 1536
>> vcpus = 2
>> cpus = "1-3"
>> cpu_weight = 128
>> disk = [ 'phy:/dev/vg_raid1/remotedesktop.vm,hda,w' ,
>> 'file:/root/win7x86.iso,hdc:cdrom,r' ]
>> vif = [ 'mac=98:95:00:07:07:07, bridge=br203,
> vifname=vm.rdp' ]
>> builder = "hvm"
>> usbdevice = "tablet"
>> vnc = 1
>> vnclisten = "10.1.1.1"
>> vncdisplay = 1 # port 5901
>> vncpasswd = ''
>> localtime = 1
>> viridian = 1
>> xen_platform_pci= 1
>>
>> It seems that this happens no matter what I do.
>>
>> Has anyone come across this before?
> When you have a system in this state, can you run
> xl debug-keys q
xl dmesg >> xen-dmesg.log
> And provide the log file.
> Most likely, there will be one unfreed page keeping the domain around as
> a zombie.
> ~Andrew
Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
the xendomains script f.e. seems to interpret this literally and bails out without shutting down
any other domains.
--
Sander
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 16:38 ` Sander Eikelenboom
@ 2013-11-25 10:36 ` Ian Campbell
2013-11-25 10:47 ` Steven Haigh
2013-11-25 10:50 ` Sander Eikelenboom
0 siblings, 2 replies; 19+ messages in thread
From: Ian Campbell @ 2013-11-25 10:36 UTC (permalink / raw)
To: Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel, Steven Haigh
On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
> any other domains.
I guess it should be a one liner, so please submit a patch. Not sure
what alternative string should be used, since you would want to avoid
clashing with any potential real domain's name.
>From that PoV it might be better to teach xendomains to ignore such
domains.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:36 ` Ian Campbell
@ 2013-11-25 10:47 ` Steven Haigh
2013-11-25 10:50 ` Sander Eikelenboom
1 sibling, 0 replies; 19+ messages in thread
From: Steven Haigh @ 2013-11-25 10:47 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1047 bytes --]
On 25/11/13 21:36, Ian Campbell wrote:
> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
>> any other domains.
>
> I guess it should be a one liner, so please submit a patch. Not sure
> what alternative string should be used, since you would want to avoid
> clashing with any potential real domain's name.
>
> From that PoV it might be better to teach xendomains to ignore such
> domains.
This is how I actually found this problem in the first place -
xendomains (I rewrite the default script) waited until the failsafe
timeout before it rebooted the system.
I could filter out DomUs that have (null) as the 'name' - but I wasn't
sure the correct course of action here.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:36 ` Ian Campbell
2013-11-25 10:47 ` Steven Haigh
@ 2013-11-25 10:50 ` Sander Eikelenboom
2013-11-25 10:56 ` Steven Haigh
2013-11-25 11:08 ` Ian Campbell
1 sibling, 2 replies; 19+ messages in thread
From: Sander Eikelenboom @ 2013-11-25 10:50 UTC (permalink / raw)
To: Ian Campbell; +Cc: Andrew Cooper, xen-devel, Steven Haigh
Monday, November 25, 2013, 11:36:05 AM, you wrote:
> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
>> any other domains.
> I guess it should be a one liner, so please submit a patch. Not sure
> what alternative string should be used, since you would want to avoid
> clashing with any potential real domain's name.
I didn't immediately spot the place where it was set to "null".
Yes that's a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don't do that" category.
> From that PoV it might be better to teach xendomains to ignore such
> domains.
>From what i remember i also couldn't use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name).
Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk's and echoing to the user ?
> Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:50 ` Sander Eikelenboom
@ 2013-11-25 10:56 ` Steven Haigh
2013-11-25 11:02 ` Ian Campbell
2013-11-25 11:02 ` Andrew Cooper
2013-11-25 11:08 ` Ian Campbell
1 sibling, 2 replies; 19+ messages in thread
From: Steven Haigh @ 2013-11-25 10:56 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1577 bytes --]
On 25/11/13 21:50, Sander Eikelenboom wrote:
>
> Monday, November 25, 2013, 11:36:05 AM, you wrote:
>
>> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
>>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
>>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
>>> any other domains.
>
>> I guess it should be a one liner, so please submit a patch. Not sure
>> what alternative string should be used, since you would want to avoid
>> clashing with any potential real domain's name.
>
> I didn't immediately spot the place where it was set to "null".
> Yes that's a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don't do that" category.
>
>> From that PoV it might be better to teach xendomains to ignore such
>> domains.
>
> From what i remember i also couldn't use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name).
> Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk's and echoing to the user ?
Correct - once a domain enters the (null) state, you cannot use 'xl
destroy' to kill the domain. As in my first post, the domain ID still
exists, but it cannot be used. Is this a toolset bug?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:56 ` Steven Haigh
@ 2013-11-25 11:02 ` Ian Campbell
2013-11-25 11:02 ` Andrew Cooper
1 sibling, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2013-11-25 11:02 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
On Mon, 2013-11-25 at 21:56 +1100, Steven Haigh wrote:
> On 25/11/13 21:50, Sander Eikelenboom wrote:
> >
> > Monday, November 25, 2013, 11:36:05 AM, you wrote:
> >
> >> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
> >>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
> >>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
> >>> any other domains.
> >
> >> I guess it should be a one liner, so please submit a patch. Not sure
> >> what alternative string should be used, since you would want to avoid
> >> clashing with any potential real domain's name.
> >
> > I didn't immediately spot the place where it was set to "null".
> > Yes that's a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don't do that" category.
> >
> >> From that PoV it might be better to teach xendomains to ignore such
> >> domains.
> >
> > From what i remember i also couldn't use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name).
> > Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk's and echoing to the user ?
>
> Correct - once a domain enters the (null) state, you cannot use 'xl
> destroy' to kill the domain. As in my first post, the domain ID still
> exists, but it cannot be used. Is this a toolset bug?
No. It is not possible for the toolstack to kill a domain which is in
this state. If it were the domain would have died, but a memory
reference is keeping it alive and there is nothing the toolstack can do
about that.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:56 ` Steven Haigh
2013-11-25 11:02 ` Ian Campbell
@ 2013-11-25 11:02 ` Andrew Cooper
1 sibling, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2013-11-25 11:02 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1874 bytes --]
On 25/11/13 10:56, Steven Haigh wrote:
> On 25/11/13 21:50, Sander Eikelenboom wrote:
>>
>> Monday, November 25, 2013, 11:36:05 AM, you wrote:
>>
>>> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
>>>> Would it be possible to leave the domainname to something else as
"(null)" when such a state occurs,
>>>> the xendomains script f.e. seems to interpret this literally and
bails out without shutting down
>>>> any other domains.
>>
>>> I guess it should be a one liner, so please submit a patch. Not sure
>>> what alternative string should be used, since you would want to avoid
>>> clashing with any potential real domain's name.
>>
>> I didn't immediately spot the place where it was set to "null".
>> Yes that's a problem, though domainnaming has more restrictions (like
using "0" (or any other number that is also a domain-id) as domainname)
in the "just don't do that" category.
>>
>>> From that PoV it might be better to teach xendomains to ignore such
>>> domains.
>>
>> From what i remember i also couldn't use "xl destroy" on such a
domain (though i probably should by using the domain number instead of
the name).
>> Perhaps the toolscripts should just uses the domain-id numbers
instead of names for anything except printk's and echoing to the user ?
>
> Correct - once a domain enters the (null) state, you cannot use 'xl
> destroy' to kill the domain. As in my first post, the domain ID still
> exists, but it cannot be used. Is this a toolset bug?
Not really - it is a current Xen limitation.
Once a domain enters this state, there is literally nothing the
toolstack can do to further kill the domain.
One solution to the problem is for the outstanding granted pages to
transfer ownership to Xen, which allows the rest of the domain can be
cleaned up. However, that would make it far less obvious when problems
like this do occur.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 2703 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 10:50 ` Sander Eikelenboom
2013-11-25 10:56 ` Steven Haigh
@ 2013-11-25 11:08 ` Ian Campbell
1 sibling, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2013-11-25 11:08 UTC (permalink / raw)
To: Sander Eikelenboom; +Cc: Andrew Cooper, xen-devel, Steven Haigh
On Mon, 2013-11-25 at 11:50 +0100, Sander Eikelenboom wrote:
> Monday, November 25, 2013, 11:36:05 AM, you wrote:
>
> > On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:
> >> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs,
> >> the xendomains script f.e. seems to interpret this literally and bails out without shutting down
> >> any other domains.
>
> > I guess it should be a one liner, so please submit a patch. Not sure
> > what alternative string should be used, since you would want to avoid
> > clashing with any potential real domain's name.
>
> I didn't immediately spot the place where it was set to "null".
I think it is what you get from "printf("%s", NULL)" with glibc.
> Perhaps the toolscripts should just uses the domain-id numbers instead
> of names for anything except printk's and echoing to the user ?
"xl list" is the latter. You could perhaps add an option to print the
numeric domid instead (-n is commonly used for this I think) and use
that option in xendomains script?
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 16:14 HVMs terminating as (null) Steven Haigh
2013-11-23 16:18 ` Andrew Cooper
@ 2013-11-23 19:27 ` Olaf Hering
2013-11-23 19:38 ` Steven Haigh
1 sibling, 1 reply; 19+ messages in thread
From: Olaf Hering @ 2013-11-23 19:27 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
On Sun, Nov 24, Steven Haigh wrote:
> Running Xen 4.2.3 with all the current XSA fixes.
How exactly did you start the guests?
Does 'ps faxu' show qemu processes for the listed domain_ids?
What is the 'xenstore-ls -f | sort' output?
Olaf
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 19:27 ` Olaf Hering
@ 2013-11-23 19:38 ` Steven Haigh
2013-11-23 19:56 ` Steven Haigh
0 siblings, 1 reply; 19+ messages in thread
From: Steven Haigh @ 2013-11-23 19:38 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 885 bytes --]
On 24/11/13 06:27, Olaf Hering wrote:
> On Sun, Nov 24, Steven Haigh wrote:
>
>> Running Xen 4.2.3 with all the current XSA fixes.
>
> How exactly did you start the guests?
The DomUs were started with: xl create /etc/xen/<configfile>
> Does 'ps faxu' show qemu processes for the listed domain_ids?
> What is the 'xenstore-ls -f | sort' output?
I'll have to check this when I manage to reproduce it. So far, I have
been unable to get a reliable way to reproduce it. I managed to get a
system to do it every time a HVM DomU was shutdown OR restarted - but
after a reboot of the Dom0 I can't get it into that state again.
As soon as I can get a system in this state again, I'll leave it to see
what information I can extract.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 19:38 ` Steven Haigh
@ 2013-11-23 19:56 ` Steven Haigh
2013-11-23 20:03 ` Andrew Cooper
0 siblings, 1 reply; 19+ messages in thread
From: Steven Haigh @ 2013-11-23 19:56 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 1766 bytes --]
On 24/11/13 06:38, Steven Haigh wrote:
> On 24/11/13 06:27, Olaf Hering wrote:
>> On Sun, Nov 24, Steven Haigh wrote:
>>
>>> Running Xen 4.2.3 with all the current XSA fixes.
>>
>> How exactly did you start the guests?
>
> The DomUs were started with: xl create /etc/xen/<configfile>
>
>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>> What is the 'xenstore-ls -f | sort' output?
>
> I'll have to check this when I manage to reproduce it. So far, I have
> been unable to get a reliable way to reproduce it. I managed to get a
> system to do it every time a HVM DomU was shutdown OR restarted - but
> after a reboot of the Dom0 I can't get it into that state again.
>
> As soon as I can get a system in this state again, I'll leave it to see
> what information I can extract.
Ha! As always, as soon as I send this, I notice its happened on a Dom0.
# xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 1579 2 r-----
2731.3
planner.vm 1 1013 1 -b----
189.3
(null) 2 0 1 --psrd
301.1
tracker.vm 3 1013 2 -b----
834.4
Attached is the output of:
# xl debug-keys q
# xl dmesg > xen-dmesg.log
# gzip xen-dmesg.log
> Does 'ps faxu' show qemu processes for the listed domain_ids?
I only see a qemu process for the running DomUs - no dead or extra ones.
> What is the 'xenstore-ls -f | sort' output?
Attached as xenstore-ls.log.gz
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.1.2: xen-dmesg.log.gz --]
[-- Type: application/gzip, Size: 3355 bytes --]
[-- Attachment #1.1.3: xenstore-ls.log.gz --]
[-- Type: application/gzip, Size: 2314 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-23 19:56 ` Steven Haigh
@ 2013-11-23 20:03 ` Andrew Cooper
2013-11-23 20:09 ` Steven Haigh
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2013-11-23 20:03 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2200 bytes --]
On 23/11/13 19:56, Steven Haigh wrote:
> On 24/11/13 06:38, Steven Haigh wrote:
>> On 24/11/13 06:27, Olaf Hering wrote:
>>> On Sun, Nov 24, Steven Haigh wrote:
>>>
>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>
>>> How exactly did you start the guests?
>>
>> The DomUs were started with: xl create /etc/xen/<configfile>
>>
>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>> What is the 'xenstore-ls -f | sort' output?
>>
>> I'll have to check this when I manage to reproduce it. So far, I have
>> been unable to get a reliable way to reproduce it. I managed to get a
>> system to do it every time a HVM DomU was shutdown OR restarted - but
>> after a reboot of the Dom0 I can't get it into that state again.
>>
>> As soon as I can get a system in this state again, I'll leave it to see
>> what information I can extract.
>
> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>
> # xl list
> Name ID Mem VCPUs State
> Time(s)
> Domain-0 0 1579 2 r-----
> 2731.3
> planner.vm 1 1013 1 -b----
> 189.3
> (null) 2 0 1 --psrd
> 301.1
> tracker.vm 3 1013 2 -b----
> 834.4
>
> Attached is the output of:
> # xl debug-keys q
> # xl dmesg > xen-dmesg.log
> # gzip xen-dmesg.log
Ok - from dmesg.
(XEN) General information for domain 2:
(XEN) refcnt=1 dying=2 pause_count=2
(XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0
dirty_cpus={} max_pages=262400
(XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
(XEN) paging assistance: hap refcounts translate external
...
(XEN) Memory pages belonging to domain 2:
(XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
(XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
(XEN) PoD entries=0 cachesize=0
So there are indeed two outstanding pages causing this domain to become
a zombie. They are normal pages, with 1 outstanding ref.
Can you collect "xl debug-keys g" as well?
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 4611 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: HVMs terminating as (null)
2013-11-23 20:03 ` Andrew Cooper
@ 2013-11-23 20:09 ` Steven Haigh
2013-11-23 20:26 ` Andrew Cooper
0 siblings, 1 reply; 19+ messages in thread
From: Steven Haigh @ 2013-11-23 20:09 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2526 bytes --]
On 24/11/13 07:03, Andrew Cooper wrote:
> On 23/11/13 19:56, Steven Haigh wrote:
>> On 24/11/13 06:38, Steven Haigh wrote:
>>> On 24/11/13 06:27, Olaf Hering wrote:
>>>> On Sun, Nov 24, Steven Haigh wrote:
>>>>
>>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>>
>>>> How exactly did you start the guests?
>>>
>>> The DomUs were started with: xl create /etc/xen/<configfile>
>>>
>>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>>> What is the 'xenstore-ls -f | sort' output?
>>>
>>> I'll have to check this when I manage to reproduce it. So far, I have
>>> been unable to get a reliable way to reproduce it. I managed to get a
>>> system to do it every time a HVM DomU was shutdown OR restarted - but
>>> after a reboot of the Dom0 I can't get it into that state again.
>>>
>>> As soon as I can get a system in this state again, I'll leave it to see
>>> what information I can extract.
>>
>> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>>
>> # xl list
>> Name ID Mem VCPUs State
>> Time(s)
>> Domain-0 0 1579 2 r-----
>> 2731.3
>> planner.vm 1 1013 1 -b----
>> 189.3
>> (null) 2 0 1 --psrd
>> 301.1
>> tracker.vm 3 1013 2 -b----
>> 834.4
>>
>> Attached is the output of:
>> # xl debug-keys q
>> # xl dmesg > xen-dmesg.log
>> # gzip xen-dmesg.log
>
> Ok - from dmesg.
>
> (XEN) General information for domain 2:
> (XEN) refcnt=1 dying=2 pause_count=2
> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0
> dirty_cpus={} max_pages=262400
> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
> (XEN) paging assistance: hap refcounts translate external
> ...
> (XEN) Memory pages belonging to domain 2:
> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
> (XEN) PoD entries=0 cachesize=0
>
>
> So there are indeed two outstanding pages causing this domain to become
> a zombie. They are normal pages, with 1 outstanding ref.
>
> Can you collect "xl debug-keys g" as well?
Sure - attached.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.1.2: debug-keys-g.log.gz --]
[-- Type: application/gzip, Size: 3188 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: HVMs terminating as (null)
2013-11-23 20:09 ` Steven Haigh
@ 2013-11-23 20:26 ` Andrew Cooper
2013-11-24 17:14 ` Wei Liu
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2013-11-23 20:26 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 3104 bytes --]
On 23/11/13 20:09, Steven Haigh wrote:
> On 24/11/13 07:03, Andrew Cooper wrote:
>> On 23/11/13 19:56, Steven Haigh wrote:
>>> On 24/11/13 06:38, Steven Haigh wrote:
>>>> On 24/11/13 06:27, Olaf Hering wrote:
>>>>> On Sun, Nov 24, Steven Haigh wrote:
>>>>>
>>>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>>>
>>>>> How exactly did you start the guests?
>>>>
>>>> The DomUs were started with: xl create /etc/xen/<configfile>
>>>>
>>>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>>>> What is the 'xenstore-ls -f | sort' output?
>>>>
>>>> I'll have to check this when I manage to reproduce it. So far, I have
>>>> been unable to get a reliable way to reproduce it. I managed to get a
>>>> system to do it every time a HVM DomU was shutdown OR restarted - but
>>>> after a reboot of the Dom0 I can't get it into that state again.
>>>>
>>>> As soon as I can get a system in this state again, I'll leave it to see
>>>> what information I can extract.
>>>
>>> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>>>
>>> # xl list
>>> Name ID Mem VCPUs State
>>> Time(s)
>>> Domain-0 0 1579 2 r-----
>>> 2731.3
>>> planner.vm 1 1013 1 -b----
>>> 189.3
>>> (null) 2 0 1 --psrd
>>> 301.1
>>> tracker.vm 3 1013 2 -b----
>>> 834.4
>>>
>>> Attached is the output of:
>>> # xl debug-keys q
>>> # xl dmesg > xen-dmesg.log
>>> # gzip xen-dmesg.log
>>
>> Ok - from dmesg.
>>
>> (XEN) General information for domain 2:
>> (XEN) refcnt=1 dying=2 pause_count=2
>> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0
>> dirty_cpus={} max_pages=262400
>> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
>> (XEN) paging assistance: hap refcounts translate external
>> ...
>> (XEN) Memory pages belonging to domain 2:
>> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
>> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
>> (XEN) PoD entries=0 cachesize=0
>>
>>
>> So there are indeed two outstanding pages causing this domain to become
>> a zombie. They are normal pages, with 1 outstanding ref.
>>
>> Can you collect "xl debug-keys g" as well?
>
> Sure - attached.
(XEN) -------- active -------- -------- shared --------
(XEN) [ref] localdom mfn pin localdom gmfn flags
(XEN) grant-table for remote domain: 2 (v1)
(XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19
(XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19
Ok - so domain 2 has two outstanding grants. This explains why it is a
zombie.
Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but
seemingly unmapped.
I will have to defer to someone who knows the grant code better. Is it
possible for a domain to be a zombie just because it has two grants it
hasn't manually invalidated?
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 6441 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: HVMs terminating as (null)
2013-11-23 20:26 ` Andrew Cooper
@ 2013-11-24 17:14 ` Wei Liu
2013-11-25 1:19 ` Steven Haigh
0 siblings, 1 reply; 19+ messages in thread
From: Wei Liu @ 2013-11-24 17:14 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Steven Haigh, xen-devel@lists.xen.org
On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 23/11/13 20:09, Steven Haigh wrote:
>> On 24/11/13 07:03, Andrew Cooper wrote:
>>> On 23/11/13 19:56, Steven Haigh wrote:
>>>> On 24/11/13 06:38, Steven Haigh wrote:
>>>>> On 24/11/13 06:27, Olaf Hering wrote:
>>>>>> On Sun, Nov 24, Steven Haigh wrote:
>>>>>>
>>>>>>> Running Xen 4.2.3 with all the current XSA fixes.
>>>>>>
>>>>>> How exactly did you start the guests?
>>>>>
>>>>> The DomUs were started with: xl create /etc/xen/<configfile>
>>>>>
>>>>>> Does 'ps faxu' show qemu processes for the listed domain_ids?
>>>>>> What is the 'xenstore-ls -f | sort' output?
>>>>>
>>>>> I'll have to check this when I manage to reproduce it. So far, I have
>>>>> been unable to get a reliable way to reproduce it. I managed to get a
>>>>> system to do it every time a HVM DomU was shutdown OR restarted - but
>>>>> after a reboot of the Dom0 I can't get it into that state again.
>>>>>
>>>>> As soon as I can get a system in this state again, I'll leave it to see
>>>>> what information I can extract.
>>>>
>>>> Ha! As always, as soon as I send this, I notice its happened on a Dom0.
>>>>
>>>> # xl list
>>>> Name ID Mem VCPUs State
>>>> Time(s)
>>>> Domain-0 0 1579 2 r-----
>>>> 2731.3
>>>> planner.vm 1 1013 1 -b----
>>>> 189.3
>>>> (null) 2 0 1 --psrd
>>>> 301.1
>>>> tracker.vm 3 1013 2 -b----
>>>> 834.4
>>>>
>>>> Attached is the output of:
>>>> # xl debug-keys q
>>>> # xl dmesg > xen-dmesg.log
>>>> # gzip xen-dmesg.log
>>>
>>> Ok - from dmesg.
>>>
>>> (XEN) General information for domain 2:
>>> (XEN) refcnt=1 dying=2 pause_count=2
>>> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0
>>> dirty_cpus={} max_pages=262400
>>> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000
>>> (XEN) paging assistance: hap refcounts translate external
>>> ...
>>> (XEN) Memory pages belonging to domain 2:
>>> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000
>>> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000
>>> (XEN) PoD entries=0 cachesize=0
>>>
>>>
>>> So there are indeed two outstanding pages causing this domain to become
>>> a zombie. They are normal pages, with 1 outstanding ref.
>>>
>>> Can you collect "xl debug-keys g" as well?
>>
>> Sure - attached.
>
> (XEN) -------- active -------- -------- shared --------
> (XEN) [ref] localdom mfn pin localdom gmfn flags
> (XEN) grant-table for remote domain: 2 (v1)
> (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19
> (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19
>
> Ok - so domain 2 has two outstanding grants. This explains why it is a
> zombie.
>
> Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but
> seemingly unmapped.
>
I didn't go through the whole thread, is there any chance you upgraded
your Dom0 kernel?
It is possible that you miss some upstream patches.
Check out <527B8465.6050901@citrix.com>
Wei.
> I will have to defer to someone who knows the grant code better. Is it
> possible for a domain to be a zombie just because it has two grants it
> hasn't manually invalidated?
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: HVMs terminating as (null)
2013-11-24 17:14 ` Wei Liu
@ 2013-11-25 1:19 ` Steven Haigh
2013-11-25 11:07 ` Wei Liu
0 siblings, 1 reply; 19+ messages in thread
From: Steven Haigh @ 2013-11-25 1:19 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1194 bytes --]
On 25/11/2013 4:14 AM, Wei Liu wrote:
> On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> (XEN) -------- active -------- -------- shared --------
>> (XEN) [ref] localdom mfn pin localdom gmfn flags
>> (XEN) grant-table for remote domain: 2 (v1)
>> (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19
>> (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19
>>
>> Ok - so domain 2 has two outstanding grants. This explains why it is a
>> zombie.
>>
>> Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but
>> seemingly unmapped.
>>
>
> I didn't go through the whole thread, is there any chance you upgraded
> your Dom0 kernel?
>
> It is possible that you miss some upstream patches.
The Dom0 kernel is currently 3.11.7 on the system I've seen the problem
on after only a few hours of uptime. I'm in the middle of pushing 3.11.9
to that system. I use the vanilla kernel from kernel.org for all my Dom0
systems.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 834 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: HVMs terminating as (null)
2013-11-25 1:19 ` Steven Haigh
@ 2013-11-25 11:07 ` Wei Liu
0 siblings, 0 replies; 19+ messages in thread
From: Wei Liu @ 2013-11-25 11:07 UTC (permalink / raw)
To: Steven Haigh; +Cc: wei.liu2, xen-devel
On Mon, Nov 25, 2013 at 12:19:40PM +1100, Steven Haigh wrote:
> On 25/11/2013 4:14 AM, Wei Liu wrote:
> > On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> (XEN) -------- active -------- -------- shared --------
> >> (XEN) [ref] localdom mfn pin localdom gmfn flags
> >> (XEN) grant-table for remote domain: 2 (v1)
> >> (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19
> >> (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19
> >>
> >> Ok - so domain 2 has two outstanding grants. This explains why it is a
> >> zombie.
> >>
> >> Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but
> >> seemingly unmapped.
> >>
> >
> > I didn't go through the whole thread, is there any chance you upgraded
> > your Dom0 kernel?
> >
> > It is possible that you miss some upstream patches.
>
> The Dom0 kernel is currently 3.11.7 on the system I've seen the problem
> on after only a few hours of uptime. I'm in the middle of pushing 3.11.9
> to that system. I use the vanilla kernel from kernel.org for all my Dom0
> systems.
>
Yes, 3.11.7 is missing those two patches which 3.11.9 has those.
They should fix your issue.
Wei.
> --
> Steven Haigh
>
> Email: netwiz@crc.id.au
> Web: https://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
> Fax: (03) 8338 0299
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-11-25 11:08 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-23 16:14 HVMs terminating as (null) Steven Haigh
2013-11-23 16:18 ` Andrew Cooper
2013-11-23 16:38 ` Sander Eikelenboom
2013-11-25 10:36 ` Ian Campbell
2013-11-25 10:47 ` Steven Haigh
2013-11-25 10:50 ` Sander Eikelenboom
2013-11-25 10:56 ` Steven Haigh
2013-11-25 11:02 ` Ian Campbell
2013-11-25 11:02 ` Andrew Cooper
2013-11-25 11:08 ` Ian Campbell
2013-11-23 19:27 ` Olaf Hering
2013-11-23 19:38 ` Steven Haigh
2013-11-23 19:56 ` Steven Haigh
2013-11-23 20:03 ` Andrew Cooper
2013-11-23 20:09 ` Steven Haigh
2013-11-23 20:26 ` Andrew Cooper
2013-11-24 17:14 ` Wei Liu
2013-11-25 1:19 ` Steven Haigh
2013-11-25 11:07 ` Wei Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.