From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38863) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XjmaG-0002mw-5G for qemu-devel@nongnu.org; Thu, 30 Oct 2014 06:04:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XjmaB-0001SW-1L for qemu-devel@nongnu.org; Thu, 30 Oct 2014 06:04:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53414) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XjmaA-0001S3-Qy for qemu-devel@nongnu.org; Thu, 30 Oct 2014 06:03:54 -0400 Date: Thu, 30 Oct 2014 10:03:45 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20141030100344.GE2376@work-vm> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Subject: Re: [Qemu-devel] Bug in recent postcopy patch List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gary Hook Cc: "qemu-devel@nongnu.org" * Gary Hook (gary.hook@nimboxx.com) wrote: > *Knock* *knock* *knock* Is this thing on? Yes - but only by luck did I notice this; it's normally better to reply to the thread that posted a patch and cc the authors! > I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am > poking around. An attempt to migrate results in a NULL pointer dereference > in savevm.c. Here is info from gdb: I've not tried migrating with block migration; so can you show the command line you used on qemu and the sequence of commands you used to trigger the migration? > Most of qemu_savevm_state_pending() succeeds, until it gets to the end. > Here=B9s the relevant thread while calling is_active(): >=20 > (gdb) backtrace > #0 block_is_active (opaque=3D0x7fb0ae721200 ) at > block-migration.c:860 > #1 0x00007fb0adf4a13a in qemu_savevm_state_pending (f=3D0x7fb0b01e3a40, > max_size=3Dmax_size@entry=3D0, > res_non_postcopiable=3Dres_non_postcopiable@entry=3D0x7fb09d604c90, > res_postcopiable=3Dres_postcopiable@entry=3D0x7fb09d604c88) > at /home/hook/src/qemu/postcopy2/savevm.c:983 > #2 0x00007fb0ae01bd82 in migration_thread (opaque=3D0x7fb0ae684420 > ) at migration.c:1185 > #3 0x00007fb0a824d182 in start_thread (arg=3D0x7fb09d605700) at > pthread_create.c:312 > #4 0x00007fb0a7f79fbd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 >=20 > Q: why is max_size =3D=3D 0? Does this seem correct? Yes, I think that's normal for the 1st time through the loop; (see migratio= n_thread near the start max_size is initialised to 0). > We look at se->ops: >=20 > (gdb) print *se->ops > $9 =3D {set_params =3D 0x7fb0ae028820 , save_state =3D = 0x0, > cancel =3D 0x7fb0ae028f50 , > save_live_complete =3D 0x7fb0ae0299a0 , is_active = =3D > 0x7fb0ae028870 , > save_live_iterate =3D 0x7fb0ae029480 , save_live_se= tup > =3D 0x7fb0ae029330 , > save_live_pending =3D 0x7fb0ae028b30 , can_postcopy= =3D > 0x0, load_state =3D 0x7fb0ae0288b0 } >=20 > Why is can_postcopy() NULL? >=20 > (gdb) n > qemu_savevm_state_pending (f=3D0x7fb0b01e3a40, max_size=3Dmax_size@entry= =3D0, > res_non_postcopiable=3Dres_non_postcopiable@entry=3D0x7fb09d604c90, > res_postcopiable=3Dres_postcopiable@entry=3D0x7fb09d604c88) at > /home/hook/src/qemu/postcopy2/savevm.c:989 > 989 if (se->ops->can_postcopy(se->opaque)) { > (gdb) print *se > $14 =3D {entry =3D {tqe_next =3D 0x7fb0aff9ab30, tqe_prev =3D 0x7fb0aff88= f20}, > idstr =3D "block", '\000' , instance_id =3D 0, > alias_id =3D 0, version_id =3D 1, section_id =3D 1, ops =3D 0x7fb0ae684= 8e0 > , vmsd =3D 0x0, > opaque =3D 0x7fb0ae721200 , compat =3D 0x0, is_ram =3D= 1} > (gdb) step >=20 > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000000000 in ?? () > (gdb)=20 >=20 >=20 > The patches appear to have been fully applied, but it would seem that the > savevm_block_handlers structure needs to be updated to populate this > field? Which implies that a new function will have to be written? >=20 > Or, if I have missed the obvious, I would appreciate enlightenment. Simple bug on my part; the line: if (se->ops->can_postcopy(se->opaque)) { needs to become: if (se->ops->can_postcopy && se->ops->can_postcopy(se->opaque)) { Thanks for the report. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK