xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
To: xen-devel@lists.xen.org, glenn@rimuhosting.com
Cc: Juergen Gross <jgross@suse.com>
Subject: Re: null domains after xl destroy
Date: Tue, 11 Apr 2017 11:49:48 +0200	[thread overview]
Message-ID: <3385656.IoOB642KYU@amur> (raw)
In-Reply-To: <70eae378-2392-bd82-670a-5dafff58c259@rimuhosting.com>

Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright:
> On 11/04/17 17:59, Juergen Gross wrote:
> > On 11/04/17 07:25, Glenn Enright wrote:
> >> Hi all
> >>
> >> We are seeing an odd issue with domu domains from xl destroy, under
> >> recent 4.9 kernels a (null) domain is left behind.
> >
> > I guess this is the dom0 kernel version?
> >
> >> This has occurred on a variety of hardware, with no obvious commonality.
> >>
> >> 4.4.55 does not show this behavior.
> >>
> >> On my test machine I have the following packages installed under
> >> centos6, from https://xen.crc.id.au/
> >>
> >> ~]# rpm -qa | grep xen
> >> xen47-licenses-4.7.2-4.el6.x86_64
> >> xen47-4.7.2-4.el6.x86_64
> >> kernel-xen-4.9.21-1.el6xen.x86_64
> >> xen47-ocaml-4.7.2-4.el6.x86_64
> >> xen47-libs-4.7.2-4.el6.x86_64
> >> xen47-libcacard-4.7.2-4.el6.x86_64
> >> xen47-hypervisor-4.7.2-4.el6.x86_64
> >> xen47-runtime-4.7.2-4.el6.x86_64
> >> kernel-xen-firmware-4.9.21-1.el6xen.x86_64
> >>
> >> I've also replicated the issue with 4.9.17 and 4.9.20
> >>
> >> To replicate, on a cleanly booted dom0 with one pv VM, I run the
> >> following on the VM
> >>
> >> {
> >> while true; do
> >>  dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
> >> done
> >> }
> >>
> >> Then on the dom0 I do this sequence to reliably get a null domain. This
> >> occurs with oxenstored and xenstored both.
> >>
> >> {
> >> xl sync 1
> >> xl destroy 1
> >> }
> >>
> >> xl list then renders something like ...
> >>
> >> (null)                                       1     4     4     --p--d
> >> 9.8     0
> >
> > Something is referencing the domain, e.g. some of its memory pages are
> > still mapped by dom0.

You can try
# xl debug-keys q
and further
# xl dmesg
to see the output of the previous command. The 'q' dumps domain
(and guest debug) info.
# xl debug-keys h
prints all possible parameters for more informations.

Dietmar.

> >
> >> From what I can see it appears to be disk related. Affected VMs all use
> >> lvm storage for their boot disk. lvdisplay of the affected lv shows that
> >> the lv has is being help open by something.
> >
> > How are the disks configured? Especially the backend type is important.
> >
> >>
> >> ~]# lvdisplay test/test.img | grep open
> >>   # open                 1
> >>
> >> I've not been able to determine what that thing is as yet. I tried lsof,
> >> dmsetup, various lv tools. Waiting for the disk to be released does not
> >> work.
> >>
> >> ~]# xl list
> >> Name                                        ID   Mem VCPUs      State
> >> Time(s)
> >> Domain-0                                     0  1512     2     r-----
> >> 29.0
> >> (null)                                       1     4     4     --p--d
> >> 9.8
> >>
> >> xenstore-ls reports nothing for the null domain id that I can see.
> >
> > Any qemu process related to the domain still running?
> >
> > Any dom0 kernel messages related to Xen?
> >
> >
> > Juergen
> >
> 
> Yep, 4.9 dom0 kernel
> 
> Typically we see an xl process running, but that has already gone away 
> in this case. The domU is a PV guest using phy definition, the basic 
> startup is like this...
> 
> xl -v create -f paramfile extra="console=hvc0 elevator=noop 
> xen-blkfront.max=64"
> 
> There are no qemu processes or threads anywhere I can see.
> 
> I dont see any meaningful messages in the linux kernel log, and nothing 
> at all in the hypervisor log. Here is output from the dom0 starting and 
> then stopping a domU using the above mechanism
> 
> br0: port 2(vif3.0) entered disabled state
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready
> xen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> xen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> vif vif-4-0 vif4.0: Guest Rx ready
> IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered forwarding state
> br0: port 2(vif4.0) entered disabled state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 left promiscuous mode
> br0: port 2(vif4.0) entered disabled state
> 
> ... here is xl info ...
> 
> host                   : xxxxxxxxxxxx
> release                : 4.9.21-1.el6xen.x86_64
> version                : #1 SMP Sat Apr 8 18:03:45 AEST 2017
> machine                : x86_64
> nr_cpus                : 4
> max_cpu_id             : 3
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 2394
> hw_caps                : 
> b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000
> virt_caps              :
> total_memory           : 8190
> free_memory            : 6577
> sharing_freed_memory   : 0
> sharing_used_memory    : 0
> outstanding_claims     : 0
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 7
> xen_extra              : .2
> xen_version            : 4.7.2
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          :
> xen_commandline        : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2 
> dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000
> cc_compiler            : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
> cc_compile_by          : mockbuild
> cc_compile_domain      : (none)
> cc_compile_date        : Mon Apr  3 12:17:20 AEST 2017
> build_id               : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa
> xend_config_format     : 4
> 
> 
> # cat /proc/cmdline
> ro root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKS 
> KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxx 
> SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet 
> pcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64
> 
> The domu is using an lvm on top of a md raid1 array, on direct connected 
> HDDs. Nothing special hardware wise. The disk line for that domU looks 
> functionally like...
> 
> disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ]
> 
> I would appreciate any suggestions on how to increase the debug level in 
> a relevant way or where to look to get more useful information on what 
> is happening.
> 
> To clarify the actual shutdown sequence that causes problems...
> 
> # xl sysrq $id s
> # xl destroy $id
> 
> 
> Regards, Glenn
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-04-11  9:49 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11  5:25 null domains after xl destroy Glenn Enright
2017-04-11  5:59 ` Juergen Gross
2017-04-11  8:03   ` Glenn Enright
2017-04-11  9:49     ` Dietmar Hahn [this message]
2017-04-11 22:13       ` Glenn Enright
2017-04-11 22:23         ` Andrew Cooper
2017-04-11 22:45           ` Glenn Enright
2017-04-18  8:36             ` Juergen Gross
2017-04-19  1:02               ` Glenn Enright
2017-04-19  4:39                 ` Juergen Gross
2017-04-19  7:16                   ` Roger Pau Monné
2017-04-19  7:35                     ` Juergen Gross
2017-04-19 10:09                     ` Juergen Gross
2017-04-19 16:22                       ` Steven Haigh
2017-04-21  8:42                         ` Steven Haigh
2017-04-21  8:44                           ` Juergen Gross
2017-05-01  0:55                       ` Glenn Enright
2017-05-03 10:45                         ` Steven Haigh
2017-05-03 13:38                           ` Juergen Gross
2017-05-03 15:53                           ` Juergen Gross
2017-05-03 16:58                             ` Steven Haigh
2017-05-03 22:17                               ` Glenn Enright
2017-05-08  9:10                                 ` Juergen Gross
2017-05-09  9:24                                   ` Roger Pau Monné
2017-05-13  4:02                                     ` Glenn Enright
2017-05-15  9:57                                       ` Juergen Gross
2017-05-16  0:49                                         ` Glenn Enright
2017-05-16  1:18                                           ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3385656.IoOB642KYU@amur \
    --to=dietmar.hahn@ts.fujitsu.com \
    --cc=glenn@rimuhosting.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).