From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NGH5V-0007k0-7e for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:11:37 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NGH5P-0007dc-Sy for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:11:36 -0500 Received: from [199.232.76.173] (port=45293 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NGH5P-0007dS-Lu for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:11:31 -0500 Received: from fmmailgate01.web.de ([217.72.192.221]:53561) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NGH5P-0000Mf-At for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:11:31 -0500 Message-ID: <4B180D1B.6050307@web.de> Date: Thu, 03 Dec 2009 20:10:19 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <20091203162121.67d9c120@doriath> <4B18051F.4040207@siemens.com> <20091203165907.2498d13c@doriath> In-Reply-To: <20091203165907.2498d13c@doriath> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig133EB1999646C6C3E49EC916" Sender: jan.kiszka@web.de Subject: [Qemu-devel] Re: [STAGING]: Block migration segfaults List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: "qemu-devel@nongnu.org" , "lirans@il.ibm.com" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig133EB1999646C6C3E49EC916 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Luiz Capitulino wrote: > On Thu, 03 Dec 2009 19:36:15 +0100 > Jan Kiszka wrote: >=20 >> Luiz Capitulino wrote: >>> Hi there, >>> >>> Got this while testing block migration in staging: >>> >>> """ >>> Program terminated with signal 11, Segmentation fault. >>> #0 0x0000000000410cf9 in monitor_vprintf (mon=3D0x0, fmt=3D0x5ae5e7 = "Start full migration for %s\n", >>> ap=3D0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.= c:192 >>> 192 if (mon->mc && !mon->mc->print_enabled) { >>> """ >>> >>> The problem here is that init_blk_migration() calls monitor_printf()= with >>> a NULL 'mon' and the backtrace shows that this is true for the entire= call >>> chain. >> What is the backtrace? And how did you start the migration? >=20 > Started the source VM with: >=20 > # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \ > -balloon virtio -m 1G -S >=20 > and the destination one with: >=20 > # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \ > -balloon virtio -m 1G -S -incoming tcp:0:4444 >=20 > Migration command issued: >=20 > (QEMU) migrate -d -b tcp:0:4444 >=20 Ah, forgot '-d'! Yes, that was precisely the use case for suppressing monitor output I was talking about. This depends on the monitor services catching NULL properly, so please push the corresponding patch. >=20 > I have no idea if this correct and wondered if specifying the same ima= ge > for the destination VM would be catastrophic. :) >=20 > The backtrace follows and it's the source VM which segfaults: >=20 > """ > #0 0x0000000000410c11 in monitor_vprintf (mon=3D0x0, fmt=3D0x5ada87 "S= tart full migration for %s\n",=20 > #1 0x0000000000410d59 in monitor_printf (mon=3D0x0, fmt=3D0x5ada87 "St= art full migration for %s\n") > #2 0x00000000004e584b in init_blk_migration (mon=3D0x0, f=3D0x2864130)= at block-migration.c:254 > #3 0x00000000004e5d8a in block_save_live (mon=3D0x0, f=3D0x2864130, st= age=3D1, opaque=3D0xbea960) > #4 0x00000000004db61a in qemu_savevm_state_begin (mon=3D0x0, f=3D0x286= 4130, blk_enable=3D1, shared=3D0) > #5 0x00000000004d2448 in migrate_fd_connect (s=3D0x20bf470) at migrati= on.c:279 > #6 0x00000000004d2849 in tcp_wait_for_connect (opaque=3D0x20bf470) at = migration-tcp.c:72 > #7 0x000000000040c3f8 in main_loop_wait (timeout=3D5000) at /home/lcap= itulino/src/aliguori-queue/vl.c:3875 > #8 0x000000000040ca17 in main_loop () at /home/lcapitulino/src/aliguor= i-queue/vl.c:4095 > #9 0x00000000004104ce in main (argc=3D10, argv=3D0x7fff6f838bf8, envp=3D= 0x7fff6f838c50) > """ >=20 > [...] >=20 >>> A patch from me (4a29a in staging) changes a higher level monitor >>> function to touch 'mon' before passing it down and here's the segfaul= t. >>> >>> Now, the point is: I could give the old behavior back but I think we= 're >>> hiding a bug there. Why would you call monitor_printf() with a NULL '= mon'? >> If there is no monitor associated with the current context, it can be >> NULL. This is mostly the case during early startup. >=20 > Why would one call monitor_printf() before monitor initialization? Via shared code e.g. >=20 >> One may also use >> this to suppress output (though I don't recall any real case ATM). >=20 > I would prefer functions like monitor_disable()/monitor_enable() > for this... They have different meanings. >=20 >>> Anyways, the following patch adds the old behavior back just in case= >>> you want to see it working... >> Yes, better restore the check. Still, your call stack would be >> interesting. Maybe there is actual another bug behind it. >=20 > Ok, I'll send the fix because it's not that important right now, > but I'm not convinced this is the right thing to do. Thanks, Jan --------------enig133EB1999646C6C3E49EC916 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAksYDR8ACgkQitSsb3rl5xQeqACgijM+dTF4mYAaXlFIsGG7D3JM j3wAn1mWAAKZA5V1b57zFAasWg0MsPcU =XUKv -----END PGP SIGNATURE----- --------------enig133EB1999646C6C3E49EC916--