From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=60861 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OOw3F-0002iO-2Y for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OOw3D-00039L-K5 for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:20 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:43333) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OOw3D-000395-H3 for qemu-devel@nongnu.org; Wed, 16 Jun 2010 13:05:19 -0400 Received: by gyd5 with SMTP id 5so4337093gyd.4 for ; Wed, 16 Jun 2010 10:05:18 -0700 (PDT) Message-ID: <4C19044B.6010602@codemonkey.ws> Date: Wed, 16 Jun 2010 12:05:15 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1276619430-15871-1-git-send-email-aliguori@us.ibm.com> <1276619430-15871-7-git-send-email-aliguori@us.ibm.com> <4C18D5FF.1050703@codemonkey.ws> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [CFR 6/10] cont command List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: Markus Armbruster , qemu-devel@nongnu.org, Stefan Hajnoczi , Luiz Capitulino On 06/16/2010 11:17 AM, Juan Quintela wrote: > Anthony Liguori wrote: > >> On 06/16/2010 08:11 AM, Juan Quintela wrote: >> > >> It's only ensured if you've got the same disk image running on another >> machine. Considering that we support migrating from a file and we >> support migrating block devices, I don't think it's practical. >> >> >>> - outgoing migration >>> >>> After sucessful migration, we can issue "cont" command in source, and >>> having source and target running at the same time -> disk corruption >>> again. >>> >>> My suggestion: >>> - add a third state "incoming", and cont/stop don't work on that state >>> - add a fourth state "migrated", and "cont" gives an explicit error, and you >>> have to run "cont --force" or "cont" twice (whatever) to get it to continue. >>> >>> >> Very few users are going to do manual migration like this and those >> that do have no good reason to execute cont in either of these >> scenarios. >> > as of today, libvirt uses it (guess who filled that bug to me). > libvirt is not a human so I fail to see how forcing it to use a --force option would help them. Either we didn't document migration well enough or their developers are not careful enough. Considering our lack of documentation, I'm sure it was the former. >> A --force command like this is equivalent to popping up a >> message box saying "are you sure you really want to do this" which >> most users find to be extremely annoying. >> > I had to debug this one from testers/field. They were testing things > and it was very "practical" to launch guest on machine A, configure > whatever they wanted, migrate to machine B. test whatever on machine B. > back to machine A, continue. > Honestly, that's a terrible testing strategy. You cannot just execute random commands and hope nothing bad happens. > You can guess what happened. The problem here is that qemu is not > giving user the _minimal_ advise that something could go wrong. And it > is not going to be wrong, it is going to cause disk corruption for sure :( > > >> We should try to inform users when it's likely that they'll stumble >> upon a dangerous action. cache=volatile is a good example of this >> because a user could have used it pretty easily and it's a reasonable >> expectation that we wouldn't expose a feature that could lead to >> corruption in obscure cases. >> > This is not _so_ obscure if you run qemu by hand :( > you have a nice "(qemu)" prompt, and if you issue "cont", bad things happen. > And if you issue system_reset, quit, commit, loadvm, pci_del, or any set of commands bad things can happen including some form of data loss or corruption. IMHO, there's a significant difference between twiddling something where there is a reasonable expectation that the impact is only going to be related to performance (like -smp X, -m X, or cache=X) and just trying random things. >> If a user executes cont in either of these scenarios and has two >> copies of a virtual machine running accessing the same resources, then >> they surely ought to expect bad behavior. >> > It is not _so_ easy O:-). > Consider the example that I showed you: > > (host A) (host B) > launch qemu launch qemu -incoming > migrate host B > ..... > do your things > exit/poweroff/... > > At this point you have a qemu launched on machine A, with nothing on > machine B. running "cont" on machine A, have disastreus consecuences, > and there is no way to prevent it :( > If there was a reasonable belief that it wouldn't result in disaster, I would fully support you. However, I can't think of any rational reason why someone would do this. I can't think of a better analogy to shooting yourself in the foot. > As I have received this bug from users a couple of times, I would like > to be able to prevent this case. > I've never seen anyone hit run into this before. Can you show me a bug report? I'd love to see how someone expected this to behave. Regards, Anthony Liguori > Later, Juan. >