From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=54683 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PI5zh-0005eQ-TW for qemu-devel@nongnu.org; Mon, 15 Nov 2010 15:49:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI5o2-0004WE-MB for qemu-devel@nongnu.org; Mon, 15 Nov 2010 15:37:40 -0500 Received: from mail-qw0-f45.google.com ([209.85.216.45]:40854) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI5o2-0004W9-Ih for qemu-devel@nongnu.org; Mon, 15 Nov 2010 15:37:38 -0500 Received: by qwd7 with SMTP id 7so21325qwd.4 for ; Mon, 15 Nov 2010 12:37:38 -0800 (PST) Message-ID: <4CE19A01.8000109@codemonkey.ws> Date: Mon, 15 Nov 2010 14:37:21 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20101104180406.GA2820@redhat.com> <5f91d600c749e66de107f60298c5ebd36645beff.1288892774.git.mst@redhat.com> <4CE149CB.1040301@codemonkey.ws> <20101115151817.GA30509@redhat.com> <4CE15B39.5070207@codemonkey.ws> <20101115202624.GA2859@redhat.com> In-Reply-To: <20101115202624.GA2859@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, Juan Quintela On 11/15/2010 02:26 PM, Michael S. Tsirkin wrote: > Are there any system ones? > There's one in qemu-char.c. Not sure it's easy to just say for the future that there can't be BHs that get delivered during vmstop. > Can we just stop processing them? > We ran into a lot of problems trying to do this with emulating synchronous I/O in the block layer. If we can avoid the stop-dispatching handlers approach, I think we'll have far fewer problems. >> I think that if we put >> something in the network layer that just queued packets if the vm is >> stopped, it would be a more robust solution to the problem. >> > Will only work for -net. The problem is for anything > that can trigger activity when vm is stopped. > Activity or external I/O? My assertion is that if we can stop things from hitting the disk, escaping the network, or leaving a character device, we're solid. >>> For memory, it is much worse: any memory changes can either get >>> discarded or not. This breaks consistency guarantees that guest relies >>> upon. Imagine virtio index getting updated but content not being >>> updated. See? >>> >> If you suppress any I/O then the memory changes don't matter because >> the same changes will happen on the destination too. >> > They matter, and same changes won't happen. > Example: > > virtio used index is in page 1, it can point at data in page 2. > device writes into data, *then* into index. Order matters, > but won't be preserved: migration assumes memory does not > change after vmstop, and so it might send old values for > data but new values for index. Result will be invalid > data coming into guest. > No, this scenario is invalid. Migration copies data while the guest is live and whenever a change happens, updates the dirty bitmap (that's why cpu_physical_memory_rw matters). Once the VM is stopped, we never return to the main loop before completing migration so nothing else gets an opportunity to run. This means when we send a page, we guarantee it won't be made dirty against until after the migration completes. Once the migration completes, the contents of memory may change but that's okay. As long as you stop packets from being sent, if you need to resume the guest, it'll pick up where it left off. > On the destination guest will pick up the index and > get bad (stale) data. > If you're seeing this happen with vhost, you aren't updating dirty bits correctly. AFAICT, this cannot happen with userspace device models. > >> I think this basic problem is the same as Kemari. We can either >> attempt to totally freeze a guest which means stopping all callbacks >> that are device related or we can prevent I/O from happening which >> should introduce enough determinism to fix the problem in practice. >> >> Regards, >> >> Anthony Liguori >> > > See above. IMO it's a different problem. Unlike Kemari, > I don't really see any drawbacks to stop all callbacks. > Do you? > I think it's going to be extremely difficult to get right and keep right. A bunch of code needs to be converted to make us safe and then becomes brittle as it's very simple to change things in such a way that we're broken again. Regards, Anthony Liguori