From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NCc1S-0003JP-D9 for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:44:18 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NCc1N-0003Cf-Dk for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:44:17 -0500 Received: from [199.232.76.173] (port=39235 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NCc1N-0003CU-65 for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:44:13 -0500 Received: from mail-bw0-f228.google.com ([209.85.218.228]:36713) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NCc1M-0000F2-LQ for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:44:12 -0500 Received: by bwz28 with SMTP id 28so5864797bwz.17 for ; Mon, 23 Nov 2009 08:44:11 -0800 (PST) Message-ID: <4B0ABBD4.9030401@codemonkey.ws> Date: Mon, 23 Nov 2009 10:44:04 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <4B0952C9.9010803@redhat.com> <4B095D86.700@codemonkey.ws> <4B09F0CA.3060705@codemonkey.ws> <20091123082659.GC2999@redhat.com> <4B0A87B2.1030507@codemonkey.ws> <4B0AA0F3.6070205@codemonkey.ws> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: Paolo Bonzini , qemu-devel@nongnu.org, Gleb Natapov Juan Quintela wrote: > you can weasel the way you want (I can also do it). > > Customer had: 5.4 <-> 5.4 migration working (suboptimally) > Now appears 5.4.1 that works best with migration. But he want to do the > migration in two steps: > > migrate from qemu 5.4 -> 5.4.1, and be able to migrate back if he don't > like it. > > At some point, he will migrate to 5.4.1 knowing that it lost backward > migration. Think of a cluster of machines here, and you just add a > 5.4.1 machine into the mix, and what this to work while you haven't > changed _all_ the machines. > If I'm a customer and you introduce this sort of change in a .z release, I would certainly want to know about it and have control over it. I don't want to transparently migrate from 5.4.1 to 5.4.0 and have my guest's time start drifting. I specifically want that to fail. If I wanted to support both models because I didn't care, then I would start with -M 5.4.0 on all of my nodes. I know you don't have a -M 5.4.1 and -M 5.4.0 but if you're introducing these sort of changes, you really should. >> However, if we rely on certain guest behavior, then it blows up the >> testing matrix because now we have to test every guest with every >> workload to see whether it works with migration. It's a slippery >> slope that's hard to get off once you start. >> > > I know :( But life sometimes don't agree with you. Notice that I > understand that our problem is different that upstream one. Our prolbem > is more in migrating from 0.11.0 -> 0.11.1, and be able to go back. > Changes in the savevm are only introduced if there is no other solution. > But we want to be able to get the 0.11.0 behaviour in 0.11.1, because we > have a mixed environment. Requesting to upgrade all the hosts at the > same time is not going to fly with any BOFH :) > You've made a policy decision. As a user, I really don't like that policy decision and it makes me want to make sure that we upgrade all of our hosts at once to avoid this problem. Of course, I'm a control freak and I'm particularly concerned about time drift issues as that's been consuming a bit of my time lately. >>> But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and >>> RHEL5.4.1, you will see that the code bases are going to be really, >>> really similar. And if any savevm format is changed, it is because >>> there are no other solution. >>> >>> >> In our own stable branch, we do not introduce any savevm changes. I >> would recommend the same policy for RHEL :-) >> > > Except if we found a bug, and there are no other solution. That is what > we try to do. And we would not change the format for a new feature, but > what happens if it was a bug that a field is really missing? > Can we reasonably support a guest that doesn't have this older field? If the answer is "yes", then it's a feature that can be delayed until the next release. >> You may be willing to expose this to your users but as an upstream >> policy, I'm very opposed to it. You're breaking the contract of >> migration by changing the guests behavior from underneath it. >> > > The layer inside me: > - You are lying when you told me that qemu-0.11 -M pc-0.10 gives me a > pc-0.10 like machine. The savevm format is different. > > (after talking about contracts, I couldn't resist) > That's a bug that we need to fix. > I could make more examples to you. But that would just make the > discussion longer. What we have here is: > > - migration beteween 0.11.0 -> 0.11.0 works some way > - I want "that very way" between 0.11.1 -> 0.11.0. > Not a problem as long as we don't introduce features in the stable branch. >> A better approach would be having an option to "force" a migration >> across incompatible versions. I think such an option would be pretty >> dangerous to offer but at least it puts the decision in the hands of >> the management software where it belongs. >> > > The difference is where you put things. In the source (newer code) or > in the target (older code). By definition, once that you have changed > something, you can change it to be backward compatible. What is a bit > more difficult is to take the time machine, go to the past, and change > 5.4 to be compatible with 5.4.1. (*) > The problem here isn't migration, it's what you've decided to backport into your stable branch. Note that the discussion we're having isn't about backporting pvclock to qemu or qemu/kvm's stable branch. We're not going to change the migration protocol in upstream to support a decision that we haven't actually made. And from an upstream position, I would oppose implementing the pvclock change in the stable branch exactly because of the problems it would create with live migration. Regards, Anthony Liguori