Luiz Capitulino wrote: > On Thu, 03 Dec 2009 19:36:15 +0100 > Jan Kiszka wrote: > >> Luiz Capitulino wrote: >>> Hi there, >>> >>> Got this while testing block migration in staging: >>> >>> """ >>> Program terminated with signal 11, Segmentation fault. >>> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n", >>> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192 >>> 192 if (mon->mc && !mon->mc->print_enabled) { >>> """ >>> >>> The problem here is that init_blk_migration() calls monitor_printf() with >>> a NULL 'mon' and the backtrace shows that this is true for the entire call >>> chain. >> What is the backtrace? And how did you start the migration? > > Started the source VM with: > > # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \ > -balloon virtio -m 1G -S > > and the destination one with: > > # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \ > -balloon virtio -m 1G -S -incoming tcp:0:4444 > > Migration command issued: > > (QEMU) migrate -d -b tcp:0:4444 > Ah, forgot '-d'! Yes, that was precisely the use case for suppressing monitor output I was talking about. This depends on the monitor services catching NULL properly, so please push the corresponding patch. > > I have no idea if this correct and wondered if specifying the same image > for the destination VM would be catastrophic. :) > > The backtrace follows and it's the source VM which segfaults: > > """ > #0 0x0000000000410c11 in monitor_vprintf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n", > #1 0x0000000000410d59 in monitor_printf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n") > #2 0x00000000004e584b in init_blk_migration (mon=0x0, f=0x2864130) at block-migration.c:254 > #3 0x00000000004e5d8a in block_save_live (mon=0x0, f=0x2864130, stage=1, opaque=0xbea960) > #4 0x00000000004db61a in qemu_savevm_state_begin (mon=0x0, f=0x2864130, blk_enable=1, shared=0) > #5 0x00000000004d2448 in migrate_fd_connect (s=0x20bf470) at migration.c:279 > #6 0x00000000004d2849 in tcp_wait_for_connect (opaque=0x20bf470) at migration-tcp.c:72 > #7 0x000000000040c3f8 in main_loop_wait (timeout=5000) at /home/lcapitulino/src/aliguori-queue/vl.c:3875 > #8 0x000000000040ca17 in main_loop () at /home/lcapitulino/src/aliguori-queue/vl.c:4095 > #9 0x00000000004104ce in main (argc=10, argv=0x7fff6f838bf8, envp=0x7fff6f838c50) > """ > > [...] > >>> A patch from me (4a29a in staging) changes a higher level monitor >>> function to touch 'mon' before passing it down and here's the segfault. >>> >>> Now, the point is: I could give the old behavior back but I think we're >>> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'? >> If there is no monitor associated with the current context, it can be >> NULL. This is mostly the case during early startup. > > Why would one call monitor_printf() before monitor initialization? Via shared code e.g. > >> One may also use >> this to suppress output (though I don't recall any real case ATM). > > I would prefer functions like monitor_disable()/monitor_enable() > for this... They have different meanings. > >>> Anyways, the following patch adds the old behavior back just in case >>> you want to see it working... >> Yes, better restore the check. Still, your call stack would be >> interesting. Maybe there is actual another bug behind it. > > Ok, I'll send the fix because it's not that important right now, > but I'm not convinced this is the right thing to do. Thanks, Jan