From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55996) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dBelJ-0005g7-3g for qemu-devel@nongnu.org; Fri, 19 May 2017 06:03:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dBelF-0007XC-4F for qemu-devel@nongnu.org; Fri, 19 May 2017 06:03:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46784) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dBelE-0007Wm-Rn for qemu-devel@nongnu.org; Fri, 19 May 2017 06:03:53 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C9DEAC04BD2D for ; Fri, 19 May 2017 10:03:51 +0000 (UTC) Date: Fri, 19 May 2017 11:03:44 +0100 From: "Daniel P. Berrange" Message-ID: <20170519100344.GG4912@redhat.com> Reply-To: "Daniel P. Berrange" References: <1495176212-14446-1-git-send-email-peterx@redhat.com> <1495176212-14446-2-git-send-email-peterx@redhat.com> <20170519082538.GD4912@redhat.com> <20170519095143.GC14679@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170519095143.GC14679@pxdev.xzpeter.org> Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: qemu-devel@nongnu.org, Juan Quintela , "Dr . David Alan Gilbert" On Fri, May 19, 2017 at 05:51:43PM +0800, Peter Xu wrote: > On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote: > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote: > > > We don't really have a return path for the other types yet. Let's check > > > this when .get_return_path() is called. > > > > > > For this, we introduce a new feature bit, and set it up only for socket > > > typed IO channels. > > > > > > This will help detect earlier failure for postcopy, e.g., logically > > > speaking postcopy cannot work with "exec:". Before this patch, when we > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system. > > > With this patch, we'll get: > > > > > > (qemu) migrate -d exec:cat>out > > > Unable to open return-path for postcopy > > > > This is wrong - post-copy migration *can* work with exec: - it just entirely > > depends on what command you are running. Your example ran a command which is > > unidirectional, but if you ran 'exec:socat ...' you would have a fully > > bidirectional channel. Actually the channel is always bi-directional, but > > 'cat' simply won't ever send data back to QEMU. > > Indeed. I should not block postcopy if the user used a TCP tunnel > between the source and destination in some way, using this exec: way. > Thanks for pointing that out. > > However I still think the idea is needed here. Say, we'd better know > whether the transport would be able to respond (though current > approach of "assuming sockets are the only ones that can reply" is not > a good solution...). Please see below. > > > > > If QEMU hangs when the other end doesn't send data back, that actually seems > > like a potentially serious bug in migration code. Even if using the normal > > 'tcp' migration protocol, if the target QEMU server hangs and fails to > > send data to QEMU on the return path, the source QEMU must never hang. > > Firstly I should not say it's a hang - it's actually by-design here > imho - migration thread is in the last phase now, waiting for a SHUT > message from destination (which I think is wise). But from the > behavior, indeed src VM is not usable during the time, just like what > happened for most postcopy cases on the source side. So, we can see > that postcopy "assumes" that destination side can reply now. > > Meanwhile, I see it reasonable for postcopy to have such an > assumption. After all, postcopy means "start VM on destination before > pages are moved over completely", then there must be someone to reply > to source, no matter whether it'll be via some kind of io channel. > > That's why I think we still need the general idea here, that we need > to know whether destination end is able to reply. > > But, I still have no good idea (after knowing this patch won't work) > on how we can do this... Any further suggestions would be greatly > welcomed. IMHO this is nothing more than a documentation issue for the 'exec' protocol. ie, document that you should provide a bi-directional transport for live migration. A uni-directional transport is arguably only valid if you're using migrate to save/restore the VM state to a file. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|