From mboxrd@z Thu Jan 1 00:00:00 1970 From: Uri Lublin Subject: Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user Date: Sun, 19 Oct 2008 15:46:43 +0200 Message-ID: <48FB3A43.9000506@il.qumranet.com> References: <1223829030-14962-1-git-send-email-uril@qumranet.com> <48F22BF1.3000608@redhat.com> <48F23D4D.2050709@codemonkey.ws> <48F23F42.10405@redhat.com> <48F277A0.8040407@codemonkey.ws> <48F2BA83.7000101@codemonkey.ws> <48F69AAB.4010404@il.qumranet.com> <48F6BFA1.9070608@codemonkey.ws> <48F6F7AA.2080102@redhat.com> <48F7399B.7000808@codemonkey.ws> <48F74E6C.8070100@il.qumranet.com> <48F75078.5090604@redhat.com> <48F75483.1020901@il.qumranet.com> <48F7FCC7.2020108@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from il.qumranet.com ([212.179.150.194]:38605 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750787AbYJSNqq (ORCPT ); Sun, 19 Oct 2008 09:46:46 -0400 In-Reply-To: <48F7FCC7.2020108@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > Uri Lublin wrote: >> >> That is true, but in the case I mentioned above it would take the >> management tool some time (guest down time) to realize what happens, >> and to send "cont" to the SRC. With end-of-migration messages SRC >> discovers DST fails and immediately continues. >> I agree those messages add some complexity, and slow things a bit for >> the good/average case. > > It's the classic general's dilemma. If SRC waits for DST to send an > ACK, DST still doesn't know whether SRC received the ACK so it doesn't > know whether it's truly safe to continue. > > This is why migration doesn't quit SRC immediately, and leaves SRC in > the stopped state. It's because the only safe way to handle this is > with a third party that is reliable. > In the scenario above (with ACK/GO messages), SRC _does_ know that DST have failed (as it does not receive ACK). With ACK/GO messages we only need third party involvement to handle a scenario where GO does not reach DST. Without ACK/GO messages we need third party involvement for almost any state-load function failure. In other words the risk/exposure is smaller with ACK/GO messages. Since in both cases we must have a third party involvement in the worst case, and since on the good/normal case those messages slow down the migration process a bit (and complicate the code a bit), I do not mind dropping those messages. I just wanted to make sure we all understand their benefit. We can always add them later if we'll "miss" them (if we'll find out they are more useful then we think now). In any case, we need to think of a way to get the migration status on the destination. A minimum is to term_printf a message specifying that status. Maybe change 'info migration' result on destination, or add a new info command. Specifically if the guest dies on DST, a management software must be able to distinguish between a migration problem (in which case it must continue running the guest on SRC) and a successful migration followed by a guest dying (in which case it must quit the guest on SRC). Regards, Uri