From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38186) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKOIV-0003Z9-Cq for qemu-devel@nongnu.org; Wed, 29 Jul 2015 06:09:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKOIS-0007l9-4R for qemu-devel@nongnu.org; Wed, 29 Jul 2015 06:09:15 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:38617) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKOIR-0007ks-T9 for qemu-devel@nongnu.org; Wed, 29 Jul 2015 06:09:12 -0400 Received: by wibxm9 with SMTP id xm9so18918028wib.1 for ; Wed, 29 Jul 2015 03:09:11 -0700 (PDT) Date: Wed, 29 Jul 2015 12:09:08 +0200 From: Eduardo Otubo Message-ID: <20150729100908.GA22821@vader> References: <20150728132213.GA1603@vader> <20150728151946.GF2247@work-vm> <20150729080303.GA7667@vader> <20150729081121.GA2267@work-vm> <20150729084104.GB7667@vader> <20150729093259.GD2267@work-vm> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xHFwDpU9dbj6ez1V" Content-Disposition: inline In-Reply-To: <20150729093259.GD2267@work-vm> Subject: Re: [Qemu-devel] Live migration hangs after migration to remote host List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Qemu-devel --xHFwDpU9dbj6ez1V Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 29, 2015 at 10=3D32=3D59AM +0100, Dr. David Alan Gilbert wrote: > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote: > > On Wed, Jul 29, 2015 at 09=3D11=3D21AM +0100, Dr. David Alan Gilbert wr= ote: > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote: > > > > On Tue, Jul 28, 2015 at 04=3D19=3D46PM +0100, Dr. David Alan Gilber= t wrote: > > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote: > > > > > > Hello all, > > > > > >=20 > > > > > > I'm facing a weird behavior on my tests: I am able to live migr= ate > > > > > > between two virtual machines on my localhost, but not to another > > > > > > machine, both using tcp. > > > > > >=20 > > > > > > * I am using the same arguments on the command line; > > > > > > * Both virtual machines uses the same qcow2 file visible throug= h NFS; > > > > > > * Both machines are in the same subnet; > > > > > > * Migration is being done from intel to intel; > > > > > > * Same version of Qemu (github master - f8787f8723); > > > > > >=20 > > > > > > Using all above I am able to live migrate on the same host: bet= ween two > > > > > > vms on local host or between two vms in the remote host; but wh= en > > > > > > migrating from local to remote, the guest hangs. I still can ac= cess its > > > > > > console via ctrl+alt+2, though, and everything seems to be norm= al. If I > > > > > > issue a reboote via console on the remote, the guest gets back = to > > > > > > normal. > > > > > >=20 > > > > > > Am I missing something here? > > > > >=20 > > > > > Just checking, but are you saying that as far as qemu is concerne= d, the migration > > > > > is happy, it's just the guest that's hung? > > > >=20 > > > > That's exactly the case. The console (via ctrl+alt+2) is active and > > > > responding to all commands normally, but the screen (ctrl+alt+1) is > > > > frozen and I can't interact with it at all. > > >=20 > > > Are you driving this via libvirt or using qemu monitor directly? > > > If the latter, can you please get an 'info migrate' from the source > > > and an 'info status' from the destination at the end of migrate. > >=20 > > I'm using qemu command line directly. And I got the problem :) See > > below. > >=20 > > >=20 > > > > > Are the host clocks on the two hosts very close (there are lots of > > > > > weird corner cases with mismatched clocks) - same time zone? > > > >=20 > > > > Yep. Both machines are in the same room and have the clock sync'ed. > > >=20 > > > OK, good. > > >=20 > > > > >=20 > > > > > Are you using cache=3Dnone (given that it's NFS shared) > > > >=20 > > > > I wasn't. But I tried again with cache=3Dnone and I got exactly the= same > > > > thing. > > >=20 > > > OK, and this pair of machines, have you tried both directions - i.e. > > > going a->b and b->a - do both directions fail? > > > Is the NFS server one of the two machines? If it is, and you're usin= g libvirt, > > > make sure that the directory the disks are on is an NFS mount on both > > > machines; e.g. don't migrate directly from the NFS export. > > >=20 > > > > Also, I tried with stable-2.2 branch and got the same behavior. I r= eally > > > > think that's very unlikely to have unstable code of such an importa= nt > > > > feature upstream, or on a stable- branch. Most probable thing is th= at > > > > I have something wrong on my environment. > > >=20 > > > Yes, the challenge is to find what; and if it's something common > > > we should try and find a way of spotting it. > > >=20 > > > > Anyway, I'll keep tetsing different stable- branches until I find > > > > something that works for me. I'll keep the mailing list posted. > > >=20 > > > Could you share the qemu command line so we can see if we can > > > spot anything? > >=20 > > Got the problem! I tried to simplify my qemu command line to the > > smallest possible, excluding things I thought it could cause the issue. > > With no further due, this is the argument: > >=20 > > -cpu 'Opteron_G4' > >=20 > > Without this argument everything works as it should, console responsive > > and guest active :) >=20 > Can you show cat /proc/cpuinfo off the two hosts? > (Only one CPU, but please include the whole entry) Intel host: ssor : 7 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz stepping : 3 microcode : 0x1c cpu MHz : 883.468 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca = cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx p= dpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology= nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx sm= x est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt t= sc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pt= s dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx= 2 smep bmi2 erms invpcid xsaveopt bugs : bogomips : 6784.87 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: AMD host: processor : 5 vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1075T Processor stepping : 0 microcode : 0x10000bf cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings : 6 core id : 5 cpu cores : 6 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca = cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe= 1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_ap= icid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_= legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstat= e npt lbrv svm_lock nrip_save pausefilter vmmcall bogomips : 6027.25 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate cpb > Dave >=20 > > It says on the documentation[1] that it's possible to migrate between > > AMD and Intel, but I think I got a corner case. Apparently I can't > > specify the exact CPU model. Is this a known issue? Couldn't find any > > reference on bugzilla or launchpad. > >=20 > > [1] - http://www.linux-kvm.org/page/Migration > >=20 > > --=20 > > Eduardo Otubo > > ProfitBricks GmbH >=20 >=20 > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >=20 --=20 Eduardo Otubo ProfitBricks GmbH --xHFwDpU9dbj6ez1V Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVuKZEAAoJEP0M/1sS+L0vQHIH/RR31e0Y655CV3KbUC+ZD/xY UUbTd2d/0d6Pg9TOcBHm/v8vAYRHx5y4Xr947wzra1NXQi5WEpbPy/mQjzao7M3V k/e4ScwJWpzXtd5SKW8UrXq97+oCfiJjimLkwpa655Tj2ELaRccaC9lP5w5Nu027 ThOuoIcb87DVM7jvOEkXmeVz/mrBrV0kYVYnmdoRkTnqM+llaEwzKnnuj7vbdFSK dPII1RBsgu/W998VKnoxQo02JD6zZWY9yE8li9ma+BMBhgAuO21blkM2t+/Q7oAV 6fqhnKi7TGz7IaL/MOPZolX+evsQSEOxaj5V2m/pLUnOrxy5GWOKEpPC568zbDc= =m40I -----END PGP SIGNATURE----- --xHFwDpU9dbj6ez1V--