[Qemu-devel] Live migration hangs after migration to remote host

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Live migration hangs after migration to remote host
@ 2015-07-28 13:22 Eduardo Otubo
  2015-07-28 15:19 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Eduardo Otubo @ 2015-07-28 13:22 UTC (permalink / raw)
  To: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

Hello all,

I'm facing a weird behavior on my tests: I am able to live migrate
between two virtual machines on my localhost, but not to another
machine, both using tcp.

* I am using the same arguments on the command line;
* Both virtual machines uses the same qcow2 file visible through NFS;
* Both machines are in the same subnet;
* Migration is being done from intel to intel;
* Same version of Qemu (github master - f8787f8723);

Using all above I am able to live migrate on the same host: between two
vms on local host or between two vms in the remote host; but when
migrating from local to remote, the guest hangs. I still can access its
console via ctrl+alt+2, though, and everything seems to be normal. If I
issue a reboote via console on the remote, the guest gets back to
normal.

Am I missing something here?

Regards,

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-28 13:22 [Qemu-devel] Live migration hangs after migration to remote host Eduardo Otubo
@ 2015-07-28 15:19 ` Dr. David Alan Gilbert
  2015-07-29  8:03   ` Eduardo Otubo
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-28 15:19 UTC (permalink / raw)
  To: Qemu-devel

* Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> Hello all,
> 
> I'm facing a weird behavior on my tests: I am able to live migrate
> between two virtual machines on my localhost, but not to another
> machine, both using tcp.
> 
> * I am using the same arguments on the command line;
> * Both virtual machines uses the same qcow2 file visible through NFS;
> * Both machines are in the same subnet;
> * Migration is being done from intel to intel;
> * Same version of Qemu (github master - f8787f8723);
> 
> Using all above I am able to live migrate on the same host: between two
> vms on local host or between two vms in the remote host; but when
> migrating from local to remote, the guest hangs. I still can access its
> console via ctrl+alt+2, though, and everything seems to be normal. If I
> issue a reboote via console on the remote, the guest gets back to
> normal.
> 
> Am I missing something here?

Just checking, but are you saying that as far as qemu is concerned, the migration
is happy, it's just the guest that's hung?

Are the host clocks on the two hosts very close (there are lots of
weird corner cases with mismatched clocks) - same time zone?

Are you using cache=none (given that it's NFS shared)

Dave

> 
> Regards,
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-28 15:19 ` Dr. David Alan Gilbert
@ 2015-07-29  8:03   ` Eduardo Otubo
  2015-07-29  8:11     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Eduardo Otubo @ 2015-07-29  8:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > Hello all,
> > 
> > I'm facing a weird behavior on my tests: I am able to live migrate
> > between two virtual machines on my localhost, but not to another
> > machine, both using tcp.
> > 
> > * I am using the same arguments on the command line;
> > * Both virtual machines uses the same qcow2 file visible through NFS;
> > * Both machines are in the same subnet;
> > * Migration is being done from intel to intel;
> > * Same version of Qemu (github master - f8787f8723);
> > 
> > Using all above I am able to live migrate on the same host: between two
> > vms on local host or between two vms in the remote host; but when
> > migrating from local to remote, the guest hangs. I still can access its
> > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > issue a reboote via console on the remote, the guest gets back to
> > normal.
> > 
> > Am I missing something here?
> 
> Just checking, but are you saying that as far as qemu is concerned, the migration
> is happy, it's just the guest that's hung?

That's exactly the case. The console (via ctrl+alt+2) is active and
responding to all commands normally, but the screen (ctrl+alt+1) is
frozen and I can't interact with it at all.

> 
> Are the host clocks on the two hosts very close (there are lots of
> weird corner cases with mismatched clocks) - same time zone?

Yep. Both machines are in the same room and have the clock sync'ed.

> 
> Are you using cache=none (given that it's NFS shared)

I wasn't. But I tried again with cache=none and I got exactly the same
thing.

Also, I tried with stable-2.2 branch and got the same behavior. I really
think that's very unlikely to have unstable code of such an important
feature upstream, or on a stable- branch. Most probable thing is that
I have something wrong on my environment.

Anyway, I'll keep tetsing different stable- branches until I find
something that works for me. I'll keep the mailing list posted.

Thanks for the light!

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29  8:03   ` Eduardo Otubo
@ 2015-07-29  8:11     ` Dr. David Alan Gilbert
  2015-07-29  8:41       ` Eduardo Otubo
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-29  8:11 UTC (permalink / raw)
  To: Qemu-devel

* Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > Hello all,
> > > 
> > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > between two virtual machines on my localhost, but not to another
> > > machine, both using tcp.
> > > 
> > > * I am using the same arguments on the command line;
> > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > * Both machines are in the same subnet;
> > > * Migration is being done from intel to intel;
> > > * Same version of Qemu (github master - f8787f8723);
> > > 
> > > Using all above I am able to live migrate on the same host: between two
> > > vms on local host or between two vms in the remote host; but when
> > > migrating from local to remote, the guest hangs. I still can access its
> > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > issue a reboote via console on the remote, the guest gets back to
> > > normal.
> > > 
> > > Am I missing something here?
> > 
> > Just checking, but are you saying that as far as qemu is concerned, the migration
> > is happy, it's just the guest that's hung?
> 
> That's exactly the case. The console (via ctrl+alt+2) is active and
> responding to all commands normally, but the screen (ctrl+alt+1) is
> frozen and I can't interact with it at all.

Are you driving this via libvirt or using qemu monitor directly?
If the latter, can you please get an 'info migrate' from the source
and an 'info status' from the destination at the end of migrate.

> > Are the host clocks on the two hosts very close (there are lots of
> > weird corner cases with mismatched clocks) - same time zone?
> 
> Yep. Both machines are in the same room and have the clock sync'ed.

OK, good.

> > 
> > Are you using cache=none (given that it's NFS shared)
> 
> I wasn't. But I tried again with cache=none and I got exactly the same
> thing.

OK, and this pair of machines, have you tried both directions - i.e.
going a->b and b->a - do both directions fail?
Is the NFS server one of the two machines?  If it is, and you're using libvirt,
make sure that the directory the disks are on is an NFS mount on both
machines; e.g. don't migrate directly from the NFS export.

> Also, I tried with stable-2.2 branch and got the same behavior. I really
> think that's very unlikely to have unstable code of such an important
> feature upstream, or on a stable- branch. Most probable thing is that
> I have something wrong on my environment.

Yes, the challenge is to find what; and if it's something common
we should try and find a way of spotting it.

> Anyway, I'll keep tetsing different stable- branches until I find
> something that works for me. I'll keep the mailing list posted.

Could you share the qemu command line so we can see if we can
spot anything?

Dave

> 
> Thanks for the light!
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29  8:11     ` Dr. David Alan Gilbert
@ 2015-07-29  8:41       ` Eduardo Otubo
  2015-07-29  9:32         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Eduardo Otubo @ 2015-07-29  8:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3939 bytes --]

On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > Hello all,
> > > > 
> > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > between two virtual machines on my localhost, but not to another
> > > > machine, both using tcp.
> > > > 
> > > > * I am using the same arguments on the command line;
> > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > * Both machines are in the same subnet;
> > > > * Migration is being done from intel to intel;
> > > > * Same version of Qemu (github master - f8787f8723);
> > > > 
> > > > Using all above I am able to live migrate on the same host: between two
> > > > vms on local host or between two vms in the remote host; but when
> > > > migrating from local to remote, the guest hangs. I still can access its
> > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > issue a reboote via console on the remote, the guest gets back to
> > > > normal.
> > > > 
> > > > Am I missing something here?
> > > 
> > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > is happy, it's just the guest that's hung?
> > 
> > That's exactly the case. The console (via ctrl+alt+2) is active and
> > responding to all commands normally, but the screen (ctrl+alt+1) is
> > frozen and I can't interact with it at all.
> 
> Are you driving this via libvirt or using qemu monitor directly?
> If the latter, can you please get an 'info migrate' from the source
> and an 'info status' from the destination at the end of migrate.

I'm using qemu command line directly. And I got the problem :) See
below.

> 
> > > Are the host clocks on the two hosts very close (there are lots of
> > > weird corner cases with mismatched clocks) - same time zone?
> > 
> > Yep. Both machines are in the same room and have the clock sync'ed.
> 
> OK, good.
> 
> > > 
> > > Are you using cache=none (given that it's NFS shared)
> > 
> > I wasn't. But I tried again with cache=none and I got exactly the same
> > thing.
> 
> OK, and this pair of machines, have you tried both directions - i.e.
> going a->b and b->a - do both directions fail?
> Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> make sure that the directory the disks are on is an NFS mount on both
> machines; e.g. don't migrate directly from the NFS export.
> 
> > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > think that's very unlikely to have unstable code of such an important
> > feature upstream, or on a stable- branch. Most probable thing is that
> > I have something wrong on my environment.
> 
> Yes, the challenge is to find what; and if it's something common
> we should try and find a way of spotting it.
> 
> > Anyway, I'll keep tetsing different stable- branches until I find
> > something that works for me. I'll keep the mailing list posted.
> 
> Could you share the qemu command line so we can see if we can
> spot anything?

Got the problem! I tried to simplify my qemu command line to the
smallest possible, excluding things I thought it could cause the issue.
With no further due, this is the argument:

    -cpu 'Opteron_G4'

Without this argument everything works as it should, console responsive
and guest active :)

It says on the documentation[1] that it's possible to migrate between
AMD and Intel, but I think I got a corner case. Apparently I can't
specify the exact CPU model. Is this a known issue? Couldn't find any
reference on bugzilla or launchpad.

[1] - http://www.linux-kvm.org/page/Migration

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29  8:41       ` Eduardo Otubo
@ 2015-07-29  9:32         ` Dr. David Alan Gilbert
  2015-07-29 10:09           ` Eduardo Otubo
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-29  9:32 UTC (permalink / raw)
  To: Qemu-devel

* Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > Hello all,
> > > > > 
> > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > between two virtual machines on my localhost, but not to another
> > > > > machine, both using tcp.
> > > > > 
> > > > > * I am using the same arguments on the command line;
> > > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > > * Both machines are in the same subnet;
> > > > > * Migration is being done from intel to intel;
> > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > 
> > > > > Using all above I am able to live migrate on the same host: between two
> > > > > vms on local host or between two vms in the remote host; but when
> > > > > migrating from local to remote, the guest hangs. I still can access its
> > > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > normal.
> > > > > 
> > > > > Am I missing something here?
> > > > 
> > > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > > is happy, it's just the guest that's hung?
> > > 
> > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > frozen and I can't interact with it at all.
> > 
> > Are you driving this via libvirt or using qemu monitor directly?
> > If the latter, can you please get an 'info migrate' from the source
> > and an 'info status' from the destination at the end of migrate.
> 
> I'm using qemu command line directly. And I got the problem :) See
> below.
> 
> > 
> > > > Are the host clocks on the two hosts very close (there are lots of
> > > > weird corner cases with mismatched clocks) - same time zone?
> > > 
> > > Yep. Both machines are in the same room and have the clock sync'ed.
> > 
> > OK, good.
> > 
> > > > 
> > > > Are you using cache=none (given that it's NFS shared)
> > > 
> > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > thing.
> > 
> > OK, and this pair of machines, have you tried both directions - i.e.
> > going a->b and b->a - do both directions fail?
> > Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> > make sure that the directory the disks are on is an NFS mount on both
> > machines; e.g. don't migrate directly from the NFS export.
> > 
> > > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > > think that's very unlikely to have unstable code of such an important
> > > feature upstream, or on a stable- branch. Most probable thing is that
> > > I have something wrong on my environment.
> > 
> > Yes, the challenge is to find what; and if it's something common
> > we should try and find a way of spotting it.
> > 
> > > Anyway, I'll keep tetsing different stable- branches until I find
> > > something that works for me. I'll keep the mailing list posted.
> > 
> > Could you share the qemu command line so we can see if we can
> > spot anything?
> 
> Got the problem! I tried to simplify my qemu command line to the
> smallest possible, excluding things I thought it could cause the issue.
> With no further due, this is the argument:
> 
>     -cpu 'Opteron_G4'
> 
> Without this argument everything works as it should, console responsive
> and guest active :)

Can you show cat /proc/cpuinfo off the two hosts?
(Only one CPU, but please include the whole entry)

Dave

> It says on the documentation[1] that it's possible to migrate between
> AMD and Intel, but I think I got a corner case. Apparently I can't
> specify the exact CPU model. Is this a known issue? Couldn't find any
> reference on bugzilla or launchpad.
> 
> [1] - http://www.linux-kvm.org/page/Migration
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29  9:32         ` Dr. David Alan Gilbert
@ 2015-07-29 10:09           ` Eduardo Otubo
  2015-07-29 10:38             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Eduardo Otubo @ 2015-07-29 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7044 bytes --]

On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > Hello all,
> > > > > > 
> > > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > > between two virtual machines on my localhost, but not to another
> > > > > > machine, both using tcp.
> > > > > > 
> > > > > > * I am using the same arguments on the command line;
> > > > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > > > * Both machines are in the same subnet;
> > > > > > * Migration is being done from intel to intel;
> > > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > > 
> > > > > > Using all above I am able to live migrate on the same host: between two
> > > > > > vms on local host or between two vms in the remote host; but when
> > > > > > migrating from local to remote, the guest hangs. I still can access its
> > > > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > > normal.
> > > > > > 
> > > > > > Am I missing something here?
> > > > > 
> > > > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > > > is happy, it's just the guest that's hung?
> > > > 
> > > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > > frozen and I can't interact with it at all.
> > > 
> > > Are you driving this via libvirt or using qemu monitor directly?
> > > If the latter, can you please get an 'info migrate' from the source
> > > and an 'info status' from the destination at the end of migrate.
> > 
> > I'm using qemu command line directly. And I got the problem :) See
> > below.
> > 
> > > 
> > > > > Are the host clocks on the two hosts very close (there are lots of
> > > > > weird corner cases with mismatched clocks) - same time zone?
> > > > 
> > > > Yep. Both machines are in the same room and have the clock sync'ed.
> > > 
> > > OK, good.
> > > 
> > > > > 
> > > > > Are you using cache=none (given that it's NFS shared)
> > > > 
> > > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > > thing.
> > > 
> > > OK, and this pair of machines, have you tried both directions - i.e.
> > > going a->b and b->a - do both directions fail?
> > > Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> > > make sure that the directory the disks are on is an NFS mount on both
> > > machines; e.g. don't migrate directly from the NFS export.
> > > 
> > > > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > > > think that's very unlikely to have unstable code of such an important
> > > > feature upstream, or on a stable- branch. Most probable thing is that
> > > > I have something wrong on my environment.
> > > 
> > > Yes, the challenge is to find what; and if it's something common
> > > we should try and find a way of spotting it.
> > > 
> > > > Anyway, I'll keep tetsing different stable- branches until I find
> > > > something that works for me. I'll keep the mailing list posted.
> > > 
> > > Could you share the qemu command line so we can see if we can
> > > spot anything?
> > 
> > Got the problem! I tried to simplify my qemu command line to the
> > smallest possible, excluding things I thought it could cause the issue.
> > With no further due, this is the argument:
> > 
> >     -cpu 'Opteron_G4'
> > 
> > Without this argument everything works as it should, console responsive
> > and guest active :)
> 
> Can you show cat /proc/cpuinfo off the two hosts?
> (Only one CPU, but please include the whole entry)

Intel host:
    ssor    : 7
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 60
    model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
    stepping    : 3
    microcode   : 0x1c
    cpu MHz     : 883.468
    cache size  : 8192 KB
    physical id : 0
    siblings    : 8
    core id     : 3
    cpu cores   : 4
    apicid      : 7
    initial apicid  : 7
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 13
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
    bugs        :
    bogomips    : 6784.87
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 39 bits physical, 48 bits virtual
    power management:

AMD host:
    processor   : 5
    vendor_id   : AuthenticAMD
    cpu family  : 16
    model       : 10
    model name  : AMD Phenom(tm) II X6 1075T Processor
    stepping    : 0
    microcode   : 0x10000bf
    cpu MHz     : 800.000
    cache size  : 512 KB
    physical id : 0
    siblings    : 6
    core id     : 5
    cpu cores   : 6
    apicid      : 5
    initial apicid  : 5
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 6
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
    bogomips    : 6027.25
    TLB size    : 1024 4K pages
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate cpb

> Dave
> 
> > It says on the documentation[1] that it's possible to migrate between
> > AMD and Intel, but I think I got a corner case. Apparently I can't
> > specify the exact CPU model. Is this a known issue? Couldn't find any
> > reference on bugzilla or launchpad.
> > 
> > [1] - http://www.linux-kvm.org/page/Migration
> > 
> > -- 
> > Eduardo Otubo
> > ProfitBricks GmbH
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29 10:09           ` Eduardo Otubo
@ 2015-07-29 10:38             ` Dr. David Alan Gilbert
  2015-07-29 12:47               ` Eduardo Otubo
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-29 10:38 UTC (permalink / raw)
  To: Qemu-devel

* Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote:
> > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > > Hello all,
> > > > > > > 
> > > > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > > > between two virtual machines on my localhost, but not to another
> > > > > > > machine, both using tcp.
> > > > > > > 
> > > > > > > * I am using the same arguments on the command line;
> > > > > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > > > > * Both machines are in the same subnet;
> > > > > > > * Migration is being done from intel to intel;
> > > > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > > > 
> > > > > > > Using all above I am able to live migrate on the same host: between two
> > > > > > > vms on local host or between two vms in the remote host; but when
> > > > > > > migrating from local to remote, the guest hangs. I still can access its
> > > > > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > > > normal.
> > > > > > > 
> > > > > > > Am I missing something here?
> > > > > > 
> > > > > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > > > > is happy, it's just the guest that's hung?
> > > > > 
> > > > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > > > frozen and I can't interact with it at all.
> > > > 
> > > > Are you driving this via libvirt or using qemu monitor directly?
> > > > If the latter, can you please get an 'info migrate' from the source
> > > > and an 'info status' from the destination at the end of migrate.
> > > 
> > > I'm using qemu command line directly. And I got the problem :) See
> > > below.
> > > 
> > > > 
> > > > > > Are the host clocks on the two hosts very close (there are lots of
> > > > > > weird corner cases with mismatched clocks) - same time zone?
> > > > > 
> > > > > Yep. Both machines are in the same room and have the clock sync'ed.
> > > > 
> > > > OK, good.
> > > > 
> > > > > > 
> > > > > > Are you using cache=none (given that it's NFS shared)
> > > > > 
> > > > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > > > thing.
> > > > 
> > > > OK, and this pair of machines, have you tried both directions - i.e.
> > > > going a->b and b->a - do both directions fail?
> > > > Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> > > > make sure that the directory the disks are on is an NFS mount on both
> > > > machines; e.g. don't migrate directly from the NFS export.
> > > > 
> > > > > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > > > > think that's very unlikely to have unstable code of such an important
> > > > > feature upstream, or on a stable- branch. Most probable thing is that
> > > > > I have something wrong on my environment.
> > > > 
> > > > Yes, the challenge is to find what; and if it's something common
> > > > we should try and find a way of spotting it.
> > > > 
> > > > > Anyway, I'll keep tetsing different stable- branches until I find
> > > > > something that works for me. I'll keep the mailing list posted.
> > > > 
> > > > Could you share the qemu command line so we can see if we can
> > > > spot anything?
> > > 
> > > Got the problem! I tried to simplify my qemu command line to the
> > > smallest possible, excluding things I thought it could cause the issue.
> > > With no further due, this is the argument:
> > > 
> > >     -cpu 'Opteron_G4'
> > > 
> > > Without this argument everything works as it should, console responsive
> > > and guest active :)
> > 
> > Can you show cat /proc/cpuinfo off the two hosts?
> > (Only one CPU, but please include the whole entry)
> 
> Intel host:
>     ssor    : 7
>     vendor_id   : GenuineIntel
>     cpu family  : 6
>     model       : 60
>     model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>     stepping    : 3
>     microcode   : 0x1c
>     cpu MHz     : 883.468
>     cache size  : 8192 KB
>     physical id : 0
>     siblings    : 8
>     core id     : 3
>     cpu cores   : 4
>     apicid      : 7
>     initial apicid  : 7
>     fpu     : yes
>     fpu_exception   : yes
>     cpuid level : 13
>     wp      : yes
>     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
>     bugs        :
>     bogomips    : 6784.87
>     clflush size    : 64
>     cache_alignment : 64
>     address sizes   : 39 bits physical, 48 bits virtual
>     power management:
> 
> AMD host:
>     processor   : 5
>     vendor_id   : AuthenticAMD
>     cpu family  : 16
>     model       : 10
>     model name  : AMD Phenom(tm) II X6 1075T Processor
>     stepping    : 0
>     microcode   : 0x10000bf
>     cpu MHz     : 800.000
>     cache size  : 512 KB
>     physical id : 0
>     siblings    : 6
>     core id     : 5
>     cpu cores   : 6
>     apicid      : 5
>     initial apicid  : 5
>     fpu     : yes
>     fpu_exception   : yes
>     cpuid level : 6
>     wp      : yes
>     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
>     bogomips    : 6027.25
>     TLB size    : 1024 4K pages
>     clflush size    : 64
>     cache_alignment : 64
>     address sizes   : 48 bits physical, 48 bits virtual
>     power management: ts ttp tm stc 100mhzsteps hwpstate cpb

OK, very different CPUs.  My guess is that one or both of them don't support
some feature of the Opteron_G4.  When specifying -cpu it's often best
to use the enforce option.

What happens if you try:

qemu-system-x86_64 -machine pc,accel=kvm -cpu Opteron_G4,enforce=on -nographic

on both hosts?
You need to pick a CPU option that works with that on both of the hosts.

Dave


> > Dave
> > 
> > > It says on the documentation[1] that it's possible to migrate between
> > > AMD and Intel, but I think I got a corner case. Apparently I can't
> > > specify the exact CPU model. Is this a known issue? Couldn't find any
> > > reference on bugzilla or launchpad.
> > > 
> > > [1] - http://www.linux-kvm.org/page/Migration
> > > 
> > > -- 
> > > Eduardo Otubo
> > > ProfitBricks GmbH
> > 
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29 10:38             ` Dr. David Alan Gilbert
@ 2015-07-29 12:47               ` Eduardo Otubo
  2015-07-29 14:21                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Eduardo Otubo @ 2015-07-29 12:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10395 bytes --]

On Wed, Jul 29, 2015 at 11=38=44AM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote:
> > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > > > Hello all,
> > > > > > > > 
> > > > > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > > > > between two virtual machines on my localhost, but not to another
> > > > > > > > machine, both using tcp.
> > > > > > > > 
> > > > > > > > * I am using the same arguments on the command line;
> > > > > > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > > > > > * Both machines are in the same subnet;
> > > > > > > > * Migration is being done from intel to intel;
> > > > > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > > > > 
> > > > > > > > Using all above I am able to live migrate on the same host: between two
> > > > > > > > vms on local host or between two vms in the remote host; but when
> > > > > > > > migrating from local to remote, the guest hangs. I still can access its
> > > > > > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > > > > normal.
> > > > > > > > 
> > > > > > > > Am I missing something here?
> > > > > > > 
> > > > > > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > > > > > is happy, it's just the guest that's hung?
> > > > > > 
> > > > > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > > > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > > > > frozen and I can't interact with it at all.
> > > > > 
> > > > > Are you driving this via libvirt or using qemu monitor directly?
> > > > > If the latter, can you please get an 'info migrate' from the source
> > > > > and an 'info status' from the destination at the end of migrate.
> > > > 
> > > > I'm using qemu command line directly. And I got the problem :) See
> > > > below.
> > > > 
> > > > > 
> > > > > > > Are the host clocks on the two hosts very close (there are lots of
> > > > > > > weird corner cases with mismatched clocks) - same time zone?
> > > > > > 
> > > > > > Yep. Both machines are in the same room and have the clock sync'ed.
> > > > > 
> > > > > OK, good.
> > > > > 
> > > > > > > 
> > > > > > > Are you using cache=none (given that it's NFS shared)
> > > > > > 
> > > > > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > > > > thing.
> > > > > 
> > > > > OK, and this pair of machines, have you tried both directions - i.e.
> > > > > going a->b and b->a - do both directions fail?
> > > > > Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> > > > > make sure that the directory the disks are on is an NFS mount on both
> > > > > machines; e.g. don't migrate directly from the NFS export.
> > > > > 
> > > > > > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > > > > > think that's very unlikely to have unstable code of such an important
> > > > > > feature upstream, or on a stable- branch. Most probable thing is that
> > > > > > I have something wrong on my environment.
> > > > > 
> > > > > Yes, the challenge is to find what; and if it's something common
> > > > > we should try and find a way of spotting it.
> > > > > 
> > > > > > Anyway, I'll keep tetsing different stable- branches until I find
> > > > > > something that works for me. I'll keep the mailing list posted.
> > > > > 
> > > > > Could you share the qemu command line so we can see if we can
> > > > > spot anything?
> > > > 
> > > > Got the problem! I tried to simplify my qemu command line to the
> > > > smallest possible, excluding things I thought it could cause the issue.
> > > > With no further due, this is the argument:
> > > > 
> > > >     -cpu 'Opteron_G4'
> > > > 
> > > > Without this argument everything works as it should, console responsive
> > > > and guest active :)
> > > 
> > > Can you show cat /proc/cpuinfo off the two hosts?
> > > (Only one CPU, but please include the whole entry)
> > 
> > Intel host:
> >     ssor    : 7
> >     vendor_id   : GenuineIntel
> >     cpu family  : 6
> >     model       : 60
> >     model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
> >     stepping    : 3
> >     microcode   : 0x1c
> >     cpu MHz     : 883.468
> >     cache size  : 8192 KB
> >     physical id : 0
> >     siblings    : 8
> >     core id     : 3
> >     cpu cores   : 4
> >     apicid      : 7
> >     initial apicid  : 7
> >     fpu     : yes
> >     fpu_exception   : yes
> >     cpuid level : 13
> >     wp      : yes
> >     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
> >     bugs        :
> >     bogomips    : 6784.87
> >     clflush size    : 64
> >     cache_alignment : 64
> >     address sizes   : 39 bits physical, 48 bits virtual
> >     power management:
> > 
> > AMD host:
> >     processor   : 5
> >     vendor_id   : AuthenticAMD
> >     cpu family  : 16
> >     model       : 10
> >     model name  : AMD Phenom(tm) II X6 1075T Processor
> >     stepping    : 0
> >     microcode   : 0x10000bf
> >     cpu MHz     : 800.000
> >     cache size  : 512 KB
> >     physical id : 0
> >     siblings    : 6
> >     core id     : 5
> >     cpu cores   : 6
> >     apicid      : 5
> >     initial apicid  : 5
> >     fpu     : yes
> >     fpu_exception   : yes
> >     cpuid level : 6
> >     wp      : yes
> >     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> >     bogomips    : 6027.25
> >     TLB size    : 1024 4K pages
> >     clflush size    : 64
> >     cache_alignment : 64
> >     address sizes   : 48 bits physical, 48 bits virtual
> >     power management: ts ttp tm stc 100mhzsteps hwpstate cpb
> 
> OK, very different CPUs.  My guess is that one or both of them don't support
> some feature of the Opteron_G4.  When specifying -cpu it's often best
> to use the enforce option.
> 
> What happens if you try:
> 
> qemu-system-x86_64 -machine pc,accel=kvm -cpu Opteron_G4,enforce=on -nographic

This is the script I'm using right now on both hosts:

    otubo@vader ~ # cat startvm.sh 
    #/bin/bash
    
    /home/otubo/develop/qemu/github/x86_64-softmmu/qemu-system-x86_64 \
        -machine pc,accel=kvm -cpu Opteron_G4,enforce=on \
        -name 'virt-tests-vm1'  \
        -sandbox off  \
        -display sdl \
        -drive id=drive_image1,cache=none,if=none,file=$1 \
        -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
        -device virtio-net-pci,mac=9a:22:23:24:25:26,id=idqE7Ggl,vectors=4,netdev=idjYAneH,bus=pci.0,addr=05  \
        -netdev user,id=idjYAneH,hostfwd=tcp::5001-:22  \
        -m 2G,slots=32,maxmem=10G \
        -smp 2,maxcpus=10,cores=1,threads=1,sockets=2  \
        -boot order=cdn,once=c,menu=off \
        -enable-kvm

> 
> on both hosts?

The output follows,
Intel host:

    otubo@vader ~ # ./startvm.sh /media/virt_images/pb-debian-7-server-latest.qcow2 
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.misalignsse [bit 7]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.3dnowprefetch [bit 8]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
    qemu-system-x86_64: Host doesn't support requested features


AMD host:

    otubo@qemu-kvm-testrunner [2015-07-29 14:41:40] ~ # ./startvm-incoming.sh /media/virt_images/pb-debian-7-server-latest.qcow2 
    warning: host doesn't support requested feature: CPUID.01H:ECX.pclmulqdq|pclmuldq [bit 1]
    warning: host doesn't support requested feature: CPUID.01H:ECX.ssse3 [bit 9]
    warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.1|sse4_1 [bit 19]
    warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.2|sse4_2 [bit 20]
    warning: host doesn't support requested feature: CPUID.01H:ECX.aes [bit 25]
    warning: host doesn't support requested feature: CPUID.01H:ECX.xsave [bit 26]
    warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 28]
    warning: host doesn't support requested feature: CPUID.80000001H:EDX.rdtscp [bit 27]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11]
    warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
    qemu-system-x86_64: Host doesn't support requested features

> You need to pick a CPU option that works with that on both of the hosts.
> 

So you think it's just a matter of fine tunning which CPU option is best for
live migration on each platform? Or it should be handled inside Qemu itself?

Regards,

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Live migration hangs after migration to remote host
  2015-07-29 12:47               ` Eduardo Otubo
@ 2015-07-29 14:21                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-29 14:21 UTC (permalink / raw)
  To: Qemu-devel

* Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> On Wed, Jul 29, 2015 at 11=38=44AM +0100, Dr. David Alan Gilbert wrote:
> > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > * Eduardo Otubo (eduardo.otubo@profitbricks.com) wrote:
> > > > > > > > > Hello all,
> > > > > > > > > 
> > > > > > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > > > > > between two virtual machines on my localhost, but not to another
> > > > > > > > > machine, both using tcp.
> > > > > > > > > 
> > > > > > > > > * I am using the same arguments on the command line;
> > > > > > > > > * Both virtual machines uses the same qcow2 file visible through NFS;
> > > > > > > > > * Both machines are in the same subnet;
> > > > > > > > > * Migration is being done from intel to intel;
> > > > > > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > > > > > 
> > > > > > > > > Using all above I am able to live migrate on the same host: between two
> > > > > > > > > vms on local host or between two vms in the remote host; but when
> > > > > > > > > migrating from local to remote, the guest hangs. I still can access its
> > > > > > > > > console via ctrl+alt+2, though, and everything seems to be normal. If I
> > > > > > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > > > > > normal.
> > > > > > > > > 
> > > > > > > > > Am I missing something here?
> > > > > > > > 
> > > > > > > > Just checking, but are you saying that as far as qemu is concerned, the migration
> > > > > > > > is happy, it's just the guest that's hung?
> > > > > > > 
> > > > > > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > > > > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > > > > > frozen and I can't interact with it at all.
> > > > > > 
> > > > > > Are you driving this via libvirt or using qemu monitor directly?
> > > > > > If the latter, can you please get an 'info migrate' from the source
> > > > > > and an 'info status' from the destination at the end of migrate.
> > > > > 
> > > > > I'm using qemu command line directly. And I got the problem :) See
> > > > > below.
> > > > > 
> > > > > > 
> > > > > > > > Are the host clocks on the two hosts very close (there are lots of
> > > > > > > > weird corner cases with mismatched clocks) - same time zone?
> > > > > > > 
> > > > > > > Yep. Both machines are in the same room and have the clock sync'ed.
> > > > > > 
> > > > > > OK, good.
> > > > > > 
> > > > > > > > 
> > > > > > > > Are you using cache=none (given that it's NFS shared)
> > > > > > > 
> > > > > > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > > > > > thing.
> > > > > > 
> > > > > > OK, and this pair of machines, have you tried both directions - i.e.
> > > > > > going a->b and b->a - do both directions fail?
> > > > > > Is the NFS server one of the two machines?  If it is, and you're using libvirt,
> > > > > > make sure that the directory the disks are on is an NFS mount on both
> > > > > > machines; e.g. don't migrate directly from the NFS export.
> > > > > > 
> > > > > > > Also, I tried with stable-2.2 branch and got the same behavior. I really
> > > > > > > think that's very unlikely to have unstable code of such an important
> > > > > > > feature upstream, or on a stable- branch. Most probable thing is that
> > > > > > > I have something wrong on my environment.
> > > > > > 
> > > > > > Yes, the challenge is to find what; and if it's something common
> > > > > > we should try and find a way of spotting it.
> > > > > > 
> > > > > > > Anyway, I'll keep tetsing different stable- branches until I find
> > > > > > > something that works for me. I'll keep the mailing list posted.
> > > > > > 
> > > > > > Could you share the qemu command line so we can see if we can
> > > > > > spot anything?
> > > > > 
> > > > > Got the problem! I tried to simplify my qemu command line to the
> > > > > smallest possible, excluding things I thought it could cause the issue.
> > > > > With no further due, this is the argument:
> > > > > 
> > > > >     -cpu 'Opteron_G4'
> > > > > 
> > > > > Without this argument everything works as it should, console responsive
> > > > > and guest active :)
> > > > 
> > > > Can you show cat /proc/cpuinfo off the two hosts?
> > > > (Only one CPU, but please include the whole entry)
> > > 
> > > Intel host:
> > >     ssor    : 7
> > >     vendor_id   : GenuineIntel
> > >     cpu family  : 6
> > >     model       : 60
> > >     model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
> > >     stepping    : 3
> > >     microcode   : 0x1c
> > >     cpu MHz     : 883.468
> > >     cache size  : 8192 KB
> > >     physical id : 0
> > >     siblings    : 8
> > >     core id     : 3
> > >     cpu cores   : 4
> > >     apicid      : 7
> > >     initial apicid  : 7
> > >     fpu     : yes
> > >     fpu_exception   : yes
> > >     cpuid level : 13
> > >     wp      : yes
> > >     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
> > >     bugs        :
> > >     bogomips    : 6784.87
> > >     clflush size    : 64
> > >     cache_alignment : 64
> > >     address sizes   : 39 bits physical, 48 bits virtual
> > >     power management:
> > > 
> > > AMD host:
> > >     processor   : 5
> > >     vendor_id   : AuthenticAMD
> > >     cpu family  : 16
> > >     model       : 10
> > >     model name  : AMD Phenom(tm) II X6 1075T Processor
> > >     stepping    : 0
> > >     microcode   : 0x10000bf
> > >     cpu MHz     : 800.000
> > >     cache size  : 512 KB
> > >     physical id : 0
> > >     siblings    : 6
> > >     core id     : 5
> > >     cpu cores   : 6
> > >     apicid      : 5
> > >     initial apicid  : 5
> > >     fpu     : yes
> > >     fpu_exception   : yes
> > >     cpuid level : 6
> > >     wp      : yes
> > >     flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> > >     bogomips    : 6027.25
> > >     TLB size    : 1024 4K pages
> > >     clflush size    : 64
> > >     cache_alignment : 64
> > >     address sizes   : 48 bits physical, 48 bits virtual
> > >     power management: ts ttp tm stc 100mhzsteps hwpstate cpb
> > 
> > OK, very different CPUs.  My guess is that one or both of them don't support
> > some feature of the Opteron_G4.  When specifying -cpu it's often best
> > to use the enforce option.
> > 
> > What happens if you try:
> > 
> > qemu-system-x86_64 -machine pc,accel=kvm -cpu Opteron_G4,enforce=on -nographic
> 
> This is the script I'm using right now on both hosts:
> 
>     otubo@vader ~ # cat startvm.sh 
>     #/bin/bash
>     
>     /home/otubo/develop/qemu/github/x86_64-softmmu/qemu-system-x86_64 \
>         -machine pc,accel=kvm -cpu Opteron_G4,enforce=on \
>         -name 'virt-tests-vm1'  \
>         -sandbox off  \
>         -display sdl \
>         -drive id=drive_image1,cache=none,if=none,file=$1 \
>         -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>         -device virtio-net-pci,mac=9a:22:23:24:25:26,id=idqE7Ggl,vectors=4,netdev=idjYAneH,bus=pci.0,addr=05  \
>         -netdev user,id=idjYAneH,hostfwd=tcp::5001-:22  \
>         -m 2G,slots=32,maxmem=10G \
>         -smp 2,maxcpus=10,cores=1,threads=1,sockets=2  \
>         -boot order=cdn,once=c,menu=off \
>         -enable-kvm
> 
> > 
> > on both hosts?
> 
> The output follows,
> Intel host:
> 
>     otubo@vader ~ # ./startvm.sh /media/virt_images/pb-debian-7-server-latest.qcow2 
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.misalignsse [bit 7]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.3dnowprefetch [bit 8]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
>     qemu-system-x86_64: Host doesn't support requested features
> 
> 
> AMD host:
> 
>     otubo@qemu-kvm-testrunner [2015-07-29 14:41:40] ~ # ./startvm-incoming.sh /media/virt_images/pb-debian-7-server-latest.qcow2 
>     warning: host doesn't support requested feature: CPUID.01H:ECX.pclmulqdq|pclmuldq [bit 1]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.ssse3 [bit 9]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.1|sse4_1 [bit 19]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.2|sse4_2 [bit 20]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.aes [bit 25]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.xsave [bit 26]
>     warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 28]
>     warning: host doesn't support requested feature: CPUID.80000001H:EDX.rdtscp [bit 27]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11]
>     warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16]
>     qemu-system-x86_64: Host doesn't support requested features

OK, so you're asking to emulate an opteron_G4 when *neither* of your hosts supports
all the features!  I know some people want the enforce=on by default to stop
people making the mistake, seems like a good idea - it would have saved you some
time.

> > You need to pick a CPU option that works with that on both of the hosts.
> > 
> 
> So you think it's just a matter of fine tunning which CPU option is best for
> live migration on each platform? Or it should be handled inside Qemu itself?

This isn't just live migration; this will even break on one host if you start
using some of the features without checking the cpu flags.
As long as you use 'enforce' qemu will fail as soon as you try and start on
the host, rather than getting this type of failure.

I'm not sure of a good CPU to pick that will work on both of those,
Penryn perhaps?

Dave
> 
> Regards,
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-29 14:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-28 13:22 [Qemu-devel] Live migration hangs after migration to remote host Eduardo Otubo
2015-07-28 15:19 ` Dr. David Alan Gilbert
2015-07-29  8:03   ` Eduardo Otubo
2015-07-29  8:11     ` Dr. David Alan Gilbert
2015-07-29  8:41       ` Eduardo Otubo
2015-07-29  9:32         ` Dr. David Alan Gilbert
2015-07-29 10:09           ` Eduardo Otubo
2015-07-29 10:38             ` Dr. David Alan Gilbert
2015-07-29 12:47               ` Eduardo Otubo
2015-07-29 14:21                 ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).