All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
To: xen-devel@lists.xen.org, glenn@rimuhosting.com
Cc: Juergen Gross <jgross@suse.com>
Subject: Re: null domains after xl destroy
Date: Tue, 11 Apr 2017 11:49:48 +0200	[thread overview]
Message-ID: <3385656.IoOB642KYU@amur> (raw)
In-Reply-To: <70eae378-2392-bd82-670a-5dafff58c259@rimuhosting.com>

Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright:
> On 11/04/17 17:59, Juergen Gross wrote:
> > On 11/04/17 07:25, Glenn Enright wrote:
> >> Hi all
> >>
> >> We are seeing an odd issue with domu domains from xl destroy, under
> >> recent 4.9 kernels a (null) domain is left behind.
> >
> > I guess this is the dom0 kernel version?
> >
> >> This has occurred on a variety of hardware, with no obvious commonality.
> >>
> >> 4.4.55 does not show this behavior.
> >>
> >> On my test machine I have the following packages installed under
> >> centos6, from https://xen.crc.id.au/
> >>
> >> ~]# rpm -qa | grep xen
> >> xen47-licenses-4.7.2-4.el6.x86_64
> >> xen47-4.7.2-4.el6.x86_64
> >> kernel-xen-4.9.21-1.el6xen.x86_64
> >> xen47-ocaml-4.7.2-4.el6.x86_64
> >> xen47-libs-4.7.2-4.el6.x86_64
> >> xen47-libcacard-4.7.2-4.el6.x86_64
> >> xen47-hypervisor-4.7.2-4.el6.x86_64
> >> xen47-runtime-4.7.2-4.el6.x86_64
> >> kernel-xen-firmware-4.9.21-1.el6xen.x86_64
> >>
> >> I've also replicated the issue with 4.9.17 and 4.9.20
> >>
> >> To replicate, on a cleanly booted dom0 with one pv VM, I run the
> >> following on the VM
> >>
> >> {
> >> while true; do
> >>  dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
> >> done
> >> }
> >>
> >> Then on the dom0 I do this sequence to reliably get a null domain. This
> >> occurs with oxenstored and xenstored both.
> >>
> >> {
> >> xl sync 1
> >> xl destroy 1
> >> }
> >>
> >> xl list then renders something like ...
> >>
> >> (null)                                       1     4     4     --p--d
> >> 9.8     0
> >
> > Something is referencing the domain, e.g. some of its memory pages are
> > still mapped by dom0.

You can try
# xl debug-keys q
and further
# xl dmesg
to see the output of the previous command. The 'q' dumps domain
(and guest debug) info.
# xl debug-keys h
prints all possible parameters for more informations.

Dietmar.

> >
> >> From what I can see it appears to be disk related. Affected VMs all use
> >> lvm storage for their boot disk. lvdisplay of the affected lv shows that
> >> the lv has is being help open by something.
> >
> > How are the disks configured? Especially the backend type is important.
> >
> >>
> >> ~]# lvdisplay test/test.img | grep open
> >>   # open                 1
> >>
> >> I've not been able to determine what that thing is as yet. I tried lsof,
> >> dmsetup, various lv tools. Waiting for the disk to be released does not
> >> work.
> >>
> >> ~]# xl list
> >> Name                                        ID   Mem VCPUs      State
> >> Time(s)
> >> Domain-0                                     0  1512     2     r-----
> >> 29.0
> >> (null)                                       1     4     4     --p--d
> >> 9.8
> >>
> >> xenstore-ls reports nothing for the null domain id that I can see.
> >
> > Any qemu process related to the domain still running?
> >
> > Any dom0 kernel messages related to Xen?
> >
> >
> > Juergen
> >
> 
> Yep, 4.9 dom0 kernel
> 
> Typically we see an xl process running, but that has already gone away 
> in this case. The domU is a PV guest using phy definition, the basic 
> startup is like this...
> 
> xl -v create -f paramfile extra="console=hvc0 elevator=noop 
> xen-blkfront.max=64"
> 
> There are no qemu processes or threads anywhere I can see.
> 
> I dont see any meaningful messages in the linux kernel log, and nothing 
> at all in the hypervisor log. Here is output from the dom0 starting and 
> then stopping a domU using the above mechanism
> 
> br0: port 2(vif3.0) entered disabled state
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready
> xen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> xen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1 
> (x86_64-abi) persistent grants
> vif vif-4-0 vif4.0: Guest Rx ready
> IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
> br0: port 2(vif4.0) entered blocking state
> br0: port 2(vif4.0) entered forwarding state
> br0: port 2(vif4.0) entered disabled state
> br0: port 2(vif4.0) entered disabled state
> device vif4.0 left promiscuous mode
> br0: port 2(vif4.0) entered disabled state
> 
> ... here is xl info ...
> 
> host                   : xxxxxxxxxxxx
> release                : 4.9.21-1.el6xen.x86_64
> version                : #1 SMP Sat Apr 8 18:03:45 AEST 2017
> machine                : x86_64
> nr_cpus                : 4
> max_cpu_id             : 3
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 2394
> hw_caps                : 
> b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000
> virt_caps              :
> total_memory           : 8190
> free_memory            : 6577
> sharing_freed_memory   : 0
> sharing_used_memory    : 0
> outstanding_claims     : 0
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 7
> xen_extra              : .2
> xen_version            : 4.7.2
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          :
> xen_commandline        : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2 
> dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000
> cc_compiler            : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
> cc_compile_by          : mockbuild
> cc_compile_domain      : (none)
> cc_compile_date        : Mon Apr  3 12:17:20 AEST 2017
> build_id               : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa
> xend_config_format     : 4
> 
> 
> # cat /proc/cmdline
> ro root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKS 
> KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxx 
> SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet 
> pcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64
> 
> The domu is using an lvm on top of a md raid1 array, on direct connected 
> HDDs. Nothing special hardware wise. The disk line for that domU looks 
> functionally like...
> 
> disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ]
> 
> I would appreciate any suggestions on how to increase the debug level in 
> a relevant way or where to look to get more useful information on what 
> is happening.
> 
> To clarify the actual shutdown sequence that causes problems...
> 
> # xl sysrq $id s
> # xl destroy $id
> 
> 
> Regards, Glenn
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-04-11  9:49 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11  5:25 null domains after xl destroy Glenn Enright
2017-04-11  5:59 ` Juergen Gross
2017-04-11  8:03   ` Glenn Enright
2017-04-11  9:49     ` Dietmar Hahn [this message]
2017-04-11 22:13       ` Glenn Enright
2017-04-11 22:23         ` Andrew Cooper
2017-04-11 22:45           ` Glenn Enright
2017-04-18  8:36             ` Juergen Gross
2017-04-19  1:02               ` Glenn Enright
2017-04-19  4:39                 ` Juergen Gross
2017-04-19  7:16                   ` Roger Pau Monné
2017-04-19  7:35                     ` Juergen Gross
2017-04-19 10:09                     ` Juergen Gross
2017-04-19 16:22                       ` Steven Haigh
2017-04-21  8:42                         ` Steven Haigh
2017-04-21  8:44                           ` Juergen Gross
2017-05-01  0:55                       ` Glenn Enright
2017-05-03 10:45                         ` Steven Haigh
2017-05-03 13:38                           ` Juergen Gross
2017-05-03 15:53                           ` Juergen Gross
2017-05-03 16:58                             ` Steven Haigh
2017-05-03 22:17                               ` Glenn Enright
2017-05-08  9:10                                 ` Juergen Gross
2017-05-09  9:24                                   ` Roger Pau Monné
2017-05-13  4:02                                     ` Glenn Enright
2017-05-15  9:57                                       ` Juergen Gross
2017-05-16  0:49                                         ` Glenn Enright
2017-05-16  1:18                                           ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3385656.IoOB642KYU@amur \
    --to=dietmar.hahn@ts.fujitsu.com \
    --cc=glenn@rimuhosting.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.