From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=60861 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OOw3F-0002iO-2Y
	for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OOw3D-00039L-K5
	for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:20 -0400
Received: from mail-gy0-f173.google.com ([209.85.160.173]:43333)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OOw3D-000395-H3
	for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:19 -0400
Received: by gyd5 with SMTP id 5so4337093gyd.4
	for <qemu-devel@nongnu.org>; Wed, 16 Jun 2010 10:05:18 -0700 (PDT)
Message-ID: <4C19044B.6010602@codemonkey.ws>
Date: Wed, 16 Jun 2010 12:05:15 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <1276619430-15871-1-git-send-email-aliguori@us.ibm.com>	<1276619430-15871-7-git-send-email-aliguori@us.ibm.com>	<m3631jnmv7.fsf@trasno.mitica>
	<4C18D5FF.1050703@codemonkey.ws> <m3iq5jx884.fsf@trasno.mitica>
In-Reply-To: <m3iq5jx884.fsf@trasno.mitica>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [CFR 6/10] cont command
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, Luiz Capitulino <lcapitulino@redhat.com>

On 06/16/2010 11:17 AM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 06/16/2010 08:11 AM, Juan Quintela wrote:
>>      
>    
>> It's only ensured if you've got the same disk image running on another
>> machine.  Considering that we support migrating from a file and we
>> support migrating block devices, I don't think it's practical.
>>
>>      
>>> - outgoing migration
>>>
>>> After sucessful migration, we can issue "cont" command in source, and
>>> having source and target running at the same time ->   disk corruption
>>> again.
>>>
>>> My suggestion:
>>> - add a third state "incoming", and cont/stop don't work on that state
>>> - add a fourth state "migrated", and "cont" gives an explicit error, and you
>>>     have to run "cont --force" or "cont" twice (whatever) to get it to continue.
>>>
>>>        
>> Very few users are going to do manual migration like this and those
>> that do have no good reason to execute cont in either of these
>> scenarios.
>>      
> as of today, libvirt uses it (guess who filled that bug to me).
>    

libvirt is not a human so I fail to see how forcing it to use a --force 
option would help them.

Either we didn't document migration well enough or their developers are 
not careful enough.  Considering our lack of documentation, I'm sure it 
was the former.

>>   A --force command like this is equivalent to popping up a
>> message box saying "are you sure you really want to do this" which
>> most users find to be extremely annoying.
>>      
> I had to debug this one from testers/field.  They were testing things
> and it was very "practical" to launch guest on machine A, configure
> whatever they wanted, migrate to machine B.  test whatever on machine B.
> back to machine A, continue.
>    

Honestly, that's a terrible testing strategy.  You cannot just execute 
random commands and hope nothing bad happens.

> You can guess what happened.  The problem here is that qemu is not
> giving user the _minimal_ advise that something could go wrong.  And it
> is not going to be wrong, it is going to cause disk corruption for sure :(
>
>    
>> We should try to inform users when it's likely that they'll stumble
>> upon a dangerous action.  cache=volatile is a good example of this
>> because a user could have used it pretty easily and it's a reasonable
>> expectation that we wouldn't expose a feature that could lead to
>> corruption in obscure cases.
>>      
> This is not _so_ obscure if you run qemu by hand :(
> you have a nice "(qemu)" prompt, and if you issue "cont", bad things happen.
>    

And if you issue system_reset, quit, commit, loadvm, pci_del, or any set 
of commands bad things can happen including some form of data loss or 
corruption.

IMHO, there's a significant difference between twiddling something where 
there is a reasonable expectation that the impact is only going to be 
related to performance (like -smp X, -m X, or cache=X) and just trying 
random things.

>> If a user executes cont in either of these scenarios and has two
>> copies of a virtual machine running accessing the same resources, then
>> they surely ought to expect bad behavior.
>>      
> It is not _so_ easy O:-).
> Consider the example that I showed you:
>
> (host A)		(host B)
> launch qemu             launch qemu -incoming
> migrate host B
>                          .....
>                          do your things
>                          exit/poweroff/...
>
> At this point you have a qemu launched on machine A, with nothing on
> machine B.  running "cont" on machine A, have disastreus consecuences,
> and there is no way to prevent it :(
>    

If there was a reasonable belief that it wouldn't result in disaster, I 
would fully support you.  However, I can't think of any rational reason 
why someone would do this.  I can't think of a better analogy to 
shooting yourself in the foot.

> As I have received this bug from users a couple of times, I would like
> to be able to prevent this case.
>    

I've never seen anyone hit run into this before.  Can you show me a bug 
report?  I'd love to see how someone expected this to behave.

Regards,

Anthony Liguori

> Later, Juan.
>