From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57191)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aczY1-0008VK-Bk
	for qemu-devel@nongnu.org; Mon, 07 Mar 2016 13:06:26 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aczXy-0002qV-69
	for qemu-devel@nongnu.org; Mon, 07 Mar 2016 13:06:25 -0500
Received: from mx1.redhat.com ([209.132.183.28]:49812)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aczXy-0002qQ-1d
	for qemu-devel@nongnu.org; Mon, 07 Mar 2016 13:06:22 -0500
Date: Mon, 7 Mar 2016 18:06:17 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160307180616.GB7950@work-vm>
References: <cover.1456212545.git.amit.shah@redhat.com>
	<33f7c8c309e6625942e6b8548faa96606a6f99b1.1456212545.git.amit.shah@redhat.com>
	<20160307124911.GB2253@work-vm> <56DD826D.5090306@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56DD826D.5090306@redhat.com>
Subject: Re: [Qemu-devel] [PULL 2/5] migration: move
 bdrv_invalidate_cache_all of of coroutine context
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, "Denis V. Lunev" <den@openvz.org>, qemu list <qemu-devel@nongnu.org>, Juan Quintela <quintela@redhat.com>

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 07/03/2016 13:49, Dr. David Alan Gilbert wrote:
> >    b) The harder problem is that there's a race where qemu_bh_delete
> >       segs, and I'm not 100% sure why yet - it only does it sometime
> >       (i.e. run virt-test and leave it and it occasionally does it).
> >       From the core it looks like qemu->bh is corrupt (0x10101010...)
> >       so maybe mis has been freed at that point?
> >       I'm suspecting this is the postcopy_ram_listen_thread freeing
> >       mis at the end of it, but I don't know yet.
> 
> That should be it.  Maybe the patch can simply be reverted, because
> loadvm_postcopy_handle_run runs from a thread and not a coroutine.  Is
> this correct?

That's still in the main thread, the 'run' comes from the packaged postcopy
state, but is after the 'listener' thread has been started.

I need to understand this anyway; the way it's supposed to work is that
if postcopy is being used then not much cleanup happens in process_incoming_migration_co
instead it exits and lets postcopy_ram_listen_thread do the cleanup
at the end; I've not quite figured out what's going on here
but it almost looks like both of them are cleaning up - that shouldn't
happen.

> However I have a bug or two for you to fix, too:
> 
> 1) as far as I can see, postcopy_ram_listen_thread is not holding the
> mutex during the call to qemu_loadvm_state_main.  Is that a bug?

No; the guest is running, the only thing that gets loaded by that
listen thread is data that's postcopied - i.e. currently just ram pages
that are loaded atomically.

> 2) no one is currently joining mis->listen_thread, I suspect it actually
> should be QEMU_THREAD_DETACHED.

OK, that looks like the easier one.

Dave

> 
> :)
> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK