From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48982) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dBhS2-0007C9-NA for qemu-devel@nongnu.org; Fri, 19 May 2017 08:56:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dBhRy-0007D5-PN for qemu-devel@nongnu.org; Fri, 19 May 2017 08:56:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57554) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dBhRy-0007Cj-Gk for qemu-devel@nongnu.org; Fri, 19 May 2017 08:56:10 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 687204ACB6 for ; Fri, 19 May 2017 12:56:09 +0000 (UTC) Date: Fri, 19 May 2017 13:56:01 +0100 From: "Daniel P. Berrange" Message-ID: <20170519125601.GD28392@redhat.com> Reply-To: "Daniel P. Berrange" References: <1495176212-14446-1-git-send-email-peterx@redhat.com> <1495176212-14446-2-git-send-email-peterx@redhat.com> <20170519082538.GD4912@redhat.com> <20170519125139.GM2081@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170519125139.GM2081@work-vm> Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Peter Xu , qemu-devel@nongnu.org, Juan Quintela On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berrange@redhat.com) wrote: > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote: > > > We don't really have a return path for the other types yet. Let's check > > > this when .get_return_path() is called. > > > > > > For this, we introduce a new feature bit, and set it up only for socket > > > typed IO channels. > > > > > > This will help detect earlier failure for postcopy, e.g., logically > > > speaking postcopy cannot work with "exec:". Before this patch, when we > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system. > > > With this patch, we'll get: > > > > > > (qemu) migrate -d exec:cat>out > > > Unable to open return-path for postcopy > > > > This is wrong - post-copy migration *can* work with exec: - it just entirely > > depends on what command you are running. Your example ran a command which is > > unidirectional, but if you ran 'exec:socat ...' you would have a fully > > bidirectional channel. Actually the channel is always bi-directional, but > > 'cat' simply won't ever send data back to QEMU. > > The thing is it didn't used to be able to; prior to your conversion to > channel, postcopy would reject being started with exec: because it > couldn't open a return path, so it was safe. It safe but functionally broken because it is valid to want to use exec migration with post-copy. > > If QEMU hangs when the other end doesn't send data back, that actually seems > > like a potentially serious bug in migration code. Even if using the normal > > 'tcp' migration protocol, if the target QEMU server hangs and fails to > > send data to QEMU on the return path, the source QEMU must never hang. > > Hmm, we shouldn't get a 'hang' with a postcopy on a link without a > return path; but it does depend on how the exec: behaves on the > destination. > If the destination discards data written to it, then I think the > behaviour would be: > a) Page requests would just get dropped, they'd eventually get > fulfilled by the background page transmissions, but that could mean that > a page request would wait for minutes for the page. > b) The qemu main thread on the destination can be blocked by that, so > the monitor might not respond until the page request is fulfilled. > c) I'm not quite sure what would happen to the source return-path > thread > > The behaviour seems to have changed between 2.9.0 (f26 package) and my > reasonably recent head build. That's due to the bug we just fixed where we mistakenly didn't allow bi-directional I/O for exec commit 062d81f0e968fe1597474735f3ea038065027372 Author: Daniel P. Berrange Date: Fri Apr 21 12:12:20 2017 +0100 migration: setup bi-directional I/O channel for exec: protocol Historically the migration data channel has only needed to be unidirectional. Thus the 'exec:' protocol was requesting an I/O channel with O_RDONLY on incoming side, and O_WRONLY on the outgoing side. This is fine for classic migration, but if you then try to run TLS over it, this fails because the TLS handshake requires a bi-directional channel. Signed-off-by: Daniel P. Berrange Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela > A migration_cancel also doesn't work for 'exec' because it doesn't > support shutdown() - it just sticks in 'cancelling'. > On a socket that was broken like this the cancel would work because > it issues a shutdown() which causes the socket to cleanup. I'm curious why migration_cancel requires shutdown() to work ? Why isn't it sufficient from the source POV to just close the socket entirely straight away. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|