* [Qemu-devel] Bug in recent postcopy patch @ 2014-10-29 22:27 Gary Hook 2014-10-30 10:03 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 6+ messages in thread From: Gary Hook @ 2014-10-29 22:27 UTC (permalink / raw) To: qemu-devel@nongnu.org *Knock* *knock* *knock* Is this thing on? I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am poking around. An attempt to migrate results in a NULL pointer dereference in savevm.c. Here is info from gdb: Most of qemu_savevm_state_pending() succeeds, until it gets to the end. Here¹s the relevant thread while calling is_active(): (gdb) backtrace #0 block_is_active (opaque=0x7fb0ae721200 <block_mig_state>) at block-migration.c:860 #1 0x00007fb0adf4a13a in qemu_savevm_state_pending (f=0x7fb0b01e3a40, max_size=max_size@entry=0, res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) at /home/hook/src/qemu/postcopy2/savevm.c:983 #2 0x00007fb0ae01bd82 in migration_thread (opaque=0x7fb0ae684420 <current_migration>) at migration.c:1185 #3 0x00007fb0a824d182 in start_thread (arg=0x7fb09d605700) at pthread_create.c:312 #4 0x00007fb0a7f79fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Q: why is max_size == 0? Does this seem correct? We look at se->ops: (gdb) print *se->ops $9 = {set_params = 0x7fb0ae028820 <block_set_params>, save_state = 0x0, cancel = 0x7fb0ae028f50 <block_migration_cancel>, save_live_complete = 0x7fb0ae0299a0 <block_save_complete>, is_active = 0x7fb0ae028870 <block_is_active>, save_live_iterate = 0x7fb0ae029480 <block_save_iterate>, save_live_setup = 0x7fb0ae029330 <block_save_setup>, save_live_pending = 0x7fb0ae028b30 <block_save_pending>, can_postcopy = 0x0, load_state = 0x7fb0ae0288b0 <block_load>} Why is can_postcopy() NULL? (gdb) n qemu_savevm_state_pending (f=0x7fb0b01e3a40, max_size=max_size@entry=0, res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) at /home/hook/src/qemu/postcopy2/savevm.c:989 989 if (se->ops->can_postcopy(se->opaque)) { (gdb) print *se $14 = {entry = {tqe_next = 0x7fb0aff9ab30, tqe_prev = 0x7fb0aff88f20}, idstr = "block", '\000' <repeats 250 times>, instance_id = 0, alias_id = 0, version_id = 1, section_id = 1, ops = 0x7fb0ae6848e0 <savevm_block_handlers>, vmsd = 0x0, opaque = 0x7fb0ae721200 <block_mig_state>, compat = 0x0, is_ram = 1} (gdb) step Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) The patches appear to have been fully applied, but it would seem that the savevm_block_handlers structure needs to be updated to populate this field? Which implies that a new function will have to be written? Or, if I have missed the obvious, I would appreciate enlightenment. Thanks, Gary ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Bug in recent postcopy patch 2014-10-29 22:27 [Qemu-devel] Bug in recent postcopy patch Gary Hook @ 2014-10-30 10:03 ` Dr. David Alan Gilbert 2014-10-30 16:49 ` Gary Hook 0 siblings, 1 reply; 6+ messages in thread From: Dr. David Alan Gilbert @ 2014-10-30 10:03 UTC (permalink / raw) To: Gary Hook; +Cc: qemu-devel@nongnu.org * Gary Hook (gary.hook@nimboxx.com) wrote: > *Knock* *knock* *knock* Is this thing on? Yes - but only by luck did I notice this; it's normally better to reply to the thread that posted a patch and cc the authors! > I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am > poking around. An attempt to migrate results in a NULL pointer dereference > in savevm.c. Here is info from gdb: I've not tried migrating with block migration; so can you show the command line you used on qemu and the sequence of commands you used to trigger the migration? > Most of qemu_savevm_state_pending() succeeds, until it gets to the end. > Here¹s the relevant thread while calling is_active(): > > (gdb) backtrace > #0 block_is_active (opaque=0x7fb0ae721200 <block_mig_state>) at > block-migration.c:860 > #1 0x00007fb0adf4a13a in qemu_savevm_state_pending (f=0x7fb0b01e3a40, > max_size=max_size@entry=0, > res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, > res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) > at /home/hook/src/qemu/postcopy2/savevm.c:983 > #2 0x00007fb0ae01bd82 in migration_thread (opaque=0x7fb0ae684420 > <current_migration>) at migration.c:1185 > #3 0x00007fb0a824d182 in start_thread (arg=0x7fb09d605700) at > pthread_create.c:312 > #4 0x00007fb0a7f79fbd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Q: why is max_size == 0? Does this seem correct? Yes, I think that's normal for the 1st time through the loop; (see migration_thread near the start max_size is initialised to 0). > We look at se->ops: > > (gdb) print *se->ops > $9 = {set_params = 0x7fb0ae028820 <block_set_params>, save_state = 0x0, > cancel = 0x7fb0ae028f50 <block_migration_cancel>, > save_live_complete = 0x7fb0ae0299a0 <block_save_complete>, is_active = > 0x7fb0ae028870 <block_is_active>, > save_live_iterate = 0x7fb0ae029480 <block_save_iterate>, save_live_setup > = 0x7fb0ae029330 <block_save_setup>, > save_live_pending = 0x7fb0ae028b30 <block_save_pending>, can_postcopy = > 0x0, load_state = 0x7fb0ae0288b0 <block_load>} > > Why is can_postcopy() NULL? > > (gdb) n > qemu_savevm_state_pending (f=0x7fb0b01e3a40, max_size=max_size@entry=0, > res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, > res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) at > /home/hook/src/qemu/postcopy2/savevm.c:989 > 989 if (se->ops->can_postcopy(se->opaque)) { > (gdb) print *se > $14 = {entry = {tqe_next = 0x7fb0aff9ab30, tqe_prev = 0x7fb0aff88f20}, > idstr = "block", '\000' <repeats 250 times>, instance_id = 0, > alias_id = 0, version_id = 1, section_id = 1, ops = 0x7fb0ae6848e0 > <savevm_block_handlers>, vmsd = 0x0, > opaque = 0x7fb0ae721200 <block_mig_state>, compat = 0x0, is_ram = 1} > (gdb) step > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000000000 in ?? () > (gdb) > > > The patches appear to have been fully applied, but it would seem that the > savevm_block_handlers structure needs to be updated to populate this > field? Which implies that a new function will have to be written? > > Or, if I have missed the obvious, I would appreciate enlightenment. Simple bug on my part; the line: if (se->ops->can_postcopy(se->opaque)) { needs to become: if (se->ops->can_postcopy && se->ops->can_postcopy(se->opaque)) { Thanks for the report. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Bug in recent postcopy patch 2014-10-30 10:03 ` Dr. David Alan Gilbert @ 2014-10-30 16:49 ` Gary Hook 2014-10-30 20:08 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 6+ messages in thread From: Gary Hook @ 2014-10-30 16:49 UTC (permalink / raw) To: qemu-devel@nongnu.org; +Cc: Dr. David Alan Gilbert On 10/30/14, 5:03 AM, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >* Gary Hook (gary.hook@nimboxx.com) wrote: >> *Knock* *knock* *knock* Is this thing on? > >Yes - but only by luck did I notice this; it's normally better >to reply to the thread that posted a patch and cc the authors! Well, that depends upon the developers, I think. I was gently admonished on another list for addressing a developer (inadvertently) directly. But I appreciate your openness, and would not want to abuse your attention. >> I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am >> poking around. An attempt to migrate results in a NULL pointer >>dereference >> in savevm.c. Here is info from gdb: > >I've not tried migrating with block migration; so can you >show the command line you used on qemu and the sequence of commands >you used to trigger the migration? Yessir. We invoke the emulator from libvirt. While the problem we are dealing with applies to any VM, the one I am working with is invoked thusly (edited for readability): qemu-system-x86_64 -enable-kvm -name 88dbaf46-4692-4935-bd9d-8d8fac7725a9 \ -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 88dbaf46-4692-4935-bd9d-8d8fac7725a9 -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/88dbaf46-4692-4935-bd9d-8d 8fac7725a9.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime \ -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -drive file=/mnt/store01/virt/88dbaf46-4692-4935-bd9d-8d8fac7725a9.qcow2,if=none,i d=drive-virtio-disk0,format=qcow2,cache=writeback \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virt io-disk0,bootindex=1 \ -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw \ -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 \ -netdev tap,fd=29,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:07:19:5e,bus=pci.0,addr=0x3 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -vnc 127.0.0.1:0,password -device VGA,id=video0,bus=pci.0,addr=0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ -msg timestamp=on I posted another thread asking about migration failure due to a copy taking too long, but got no traction. In the case where the problem raises its head we have turned tunneling on. A tiny VM (<2GB in size) migrates fine using the same procedure. Again, no shared storage. >>Q: why is max_size == 0? Does this seem correct? > >Yes, I think that's normal for the 1st time through the loop; (see >migration_thread >near the start max_size is initialised to 0). Thank you; will do. >> >> >> The patches appear to have been fully applied, but it would seem that >>the >> savevm_block_handlers structure needs to be updated to populate this >> field? Which implies that a new function will have to be written? >> >> Or, if I have missed the obvious, I would appreciate enlightenment. > >Simple bug on my part; the line: > > if (se->ops->can_postcopy(se->opaque)) { > >needs to become: > if (se->ops->can_postcopy && > se->ops->can_postcopy(se->opaque)) { I wondered if that were not the case. I will make that change and see what happens. >Thanks for the report. Thank you for your time and ownership. Gary ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Bug in recent postcopy patch 2014-10-30 16:49 ` Gary Hook @ 2014-10-30 20:08 ` Dr. David Alan Gilbert 2014-10-30 21:59 ` Gary Hook 0 siblings, 1 reply; 6+ messages in thread From: Dr. David Alan Gilbert @ 2014-10-30 20:08 UTC (permalink / raw) To: Gary Hook; +Cc: qemu-devel@nongnu.org * Gary Hook (gary.hook@nimboxx.com) wrote: > On 10/30/14, 5:03 AM, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > >* Gary Hook (gary.hook@nimboxx.com) wrote: > >> *Knock* *knock* *knock* Is this thing on? > > > >Yes - but only by luck did I notice this; it's normally better > >to reply to the thread that posted a patch and cc the authors! > > Well, that depends upon the developers, I think. I was gently admonished > on another list for addressing a developer (inadvertently) directly. But I > appreciate your openness, and would not want to abuse your attention. > > >> I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am > >> poking around. An attempt to migrate results in a NULL pointer > >>dereference > >> in savevm.c. Here is info from gdb: > > > >I've not tried migrating with block migration; so can you > >show the command line you used on qemu and the sequence of commands > >you used to trigger the migration? > > Yessir. We invoke the emulator from libvirt. While the problem we are > dealing with applies to any VM, the one I am working with is invoked > thusly (edited for readability): > > qemu-system-x86_64 -enable-kvm -name 88dbaf46-4692-4935-bd9d-8d8fac7725a9 \ > -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off \ > -smp 1,sockets=1,cores=1,threads=1 \ > -uuid 88dbaf46-4692-4935-bd9d-8d8fac7725a9 -no-user-config -nodefaults \ > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/88dbaf46-4692-4935-bd9d-8d > 8fac7725a9.monitor,server,nowait \ > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime \ > -no-shutdown -boot strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ > -drive > file=/mnt/store01/virt/88dbaf46-4692-4935-bd9d-8d8fac7725a9.qcow2,if=none,i > d=drive-virtio-disk0,format=qcow2,cache=writeback \ > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virt > io-disk0,bootindex=1 \ > -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw \ > -device > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 \ > -netdev tap,fd=29,id=hostnet0 -device > rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:07:19:5e,bus=pci.0,addr=0x3 \ > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 \ > -vnc 127.0.0.1:0,password -device VGA,id=video0,bus=pci.0,addr=0x2 \ > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ > -msg timestamp=on > > I posted another thread asking about migration failure due to a copy > taking too long, but got no traction. In the case where the problem raises > its head we have turned tunneling on. A tiny VM (<2GB in size) migrates > fine using the same procedure. Again, no shared storage. Is the guest that doesn't migrate idle or is it busily changing lots of memory? > >>Q: why is max_size == 0? Does this seem correct? > > > >Yes, I think that's normal for the 1st time through the loop; (see > >migration_thread > >near the start max_size is initialised to 0). > > Thank you; will do. > > >> > >> > >> The patches appear to have been fully applied, but it would seem that > >>the > >> savevm_block_handlers structure needs to be updated to populate this > >> field? Which implies that a new function will have to be written? > >> > >> Or, if I have missed the obvious, I would appreciate enlightenment. > > > >Simple bug on my part; the line: > > > > if (se->ops->can_postcopy(se->opaque)) { > > > >needs to become: > > if (se->ops->can_postcopy && > > se->ops->can_postcopy(se->opaque)) { > > I wondered if that were not the case. I will make that change and see what > happens. > > >Thanks for the report. > > Thank you for your time and ownership. No problem; note the postcopy code is still quite young, so don't be too surprised if you hit other issues. Dave > > Gary > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Bug in recent postcopy patch 2014-10-30 20:08 ` Dr. David Alan Gilbert @ 2014-10-30 21:59 ` Gary Hook 2014-10-31 12:04 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 6+ messages in thread From: Gary Hook @ 2014-10-30 21:59 UTC (permalink / raw) To: qemu-devel@nongnu.org; +Cc: Dr. David Alan Gilbert On 10/30/14, 3:08 PM, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >>I posted another thread asking about migration failure due to a copy >> taking too long, but got no traction. In the case where the problem >>raises >> its head we have turned tunneling on. A tiny VM (<2GB in size) migrates >> fine using the same procedure. Again, no shared storage. > >Is the guest that doesn't migrate idle or is it busily changing lots of >memory? Quite idle. Boot the VM, no need to start a workload, try to migrate. Failure. Also, very large VMs will fail to migrate (non-tunneled). This _seems_ to also be related to the amount of time required to copy everything from A to B. Again, tunneling seems to more quickly expose this issue as it increases the amount of time required to copy the qcow2 file over the network. I will add here that I¹ve watched the qcow2 file grow, made a copy of it (on the receiving end) before it gets deleted, and been able to start a VM using the file. It would seem to be copasetic. I need to add tracing code to the emulator, in a way that doesn¹t rely upon command line options or environment variables. I don¹t see any such facility at this point. Specifically, I want to begin by watching what is going through the monitor (I.e. Return values from qemu-system-x86_64 and why there are complaints.) Unless you have any clear explanation as to why the emulator is throwing an error, could you suggest any areas I may want to focus my efforts? >> >> >Thanks for the report. >> >> Thank you for your time and ownership. > >No problem; note the postcopy code is still quite young, so don't >be too surprised if you hit other issues. Of course; it¹s fresh out of the oven. But the migration of VMs using non-shared storage is not (tunneled or otherwise), and that¹s really what I am focused on. Again, much appreciation. Gary ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Bug in recent postcopy patch 2014-10-30 21:59 ` Gary Hook @ 2014-10-31 12:04 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 6+ messages in thread From: Dr. David Alan Gilbert @ 2014-10-31 12:04 UTC (permalink / raw) To: Gary Hook; +Cc: qemu-devel@nongnu.org * Gary Hook (gary.hook@nimboxx.com) wrote: > > > On 10/30/14, 3:08 PM, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > >>I posted another thread asking about migration failure due to a copy > >> taking too long, but got no traction. In the case where the problem > >>raises > >> its head we have turned tunneling on. A tiny VM (<2GB in size) migrates > >> fine using the same procedure. Again, no shared storage. > > > >Is the guest that doesn't migrate idle or is it busily changing lots of > >memory? > > Quite idle. Boot the VM, no need to start a workload, try to migrate. > Failure. > > Also, very large VMs will fail to migrate (non-tunneled). This _seems_ to > also be related to the amount of time required to copy everything from A > to B. > > Again, tunneling seems to more quickly expose this issue as it increases > the amount of time required to copy the qcow2 file over the network. > > I will add here that I¹ve watched the qcow2 file grow, made a copy of it > (on the receiving end) before it gets deleted, and been able to start a VM > using the file. It would seem to be copasetic. > > I need to add tracing code to the emulator, in a way that doesn¹t rely > upon command line options or environment variables. I don¹t see any such > facility at this point. Specifically, I want to begin by watching what is > going through the monitor (I.e. Return values from qemu-system-x86_64 and > why there are complaints.) Unless you have any clear explanation as to why > the emulator is throwing an error, could you suggest any areas I may want > to focus my efforts? No I don't, but there again I've not done any block stuff, and it sounds like your problem is mostly related to moving the image file (which I thought libvirt preferred to do using NBD underneath now, but again, I'm not a block guy). > >> >Thanks for the report. > >> > >> Thank you for your time and ownership. > > > >No problem; note the postcopy code is still quite young, so don't > >be too surprised if you hit other issues. > > Of course; it¹s fresh out of the oven. But the migration of VMs using > non-shared storage is not (tunneled or otherwise), and that¹s really what > I am focused on. > > Again, much appreciation. Dave > Gary > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-10-31 15:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-29 22:27 [Qemu-devel] Bug in recent postcopy patch Gary Hook 2014-10-30 10:03 ` Dr. David Alan Gilbert 2014-10-30 16:49 ` Gary Hook 2014-10-30 20:08 ` Dr. David Alan Gilbert 2014-10-30 21:59 ` Gary Hook 2014-10-31 12:04 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).