From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=55458 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OuqDI-0004Rl-8R for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OuqDG-00058N-BT for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:36 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:56676) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OuqDG-00058F-3F for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:34 -0400 Received: by gya1 with SMTP id 1so2163194gya.4 for ; Sun, 12 Sep 2010 10:19:33 -0700 (PDT) Message-ID: <4C8D0BA3.7050706@codemonkey.ws> Date: Sun, 12 Sep 2010 12:19:31 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration References: <4C864118.7070206@linux.vnet.ibm.com> <4C864D65.6090004@redhat.com> <4C8652CB.9060801@linux.vnet.ibm.com> <4C8CCA91.4060001@redhat.com> <4C8CD4DB.9020905@codemonkey.ws> <4C8CD847.8030804@redhat.com> <4C8CF07C.5040509@codemonkey.ws> <4C8D0394.6010605@redhat.com> In-Reply-To: <4C8D0394.6010605@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , Stefan Hajnoczi , qemu-devel , "libvir-list@redhat.com" , Stefan Hajnoczi On 09/12/2010 11:45 AM, Avi Kivity wrote: >> Streaming relies on copy-on-read to do the writing. > > > Ah. You can avoid the copy-on-read implementation in the block format > driver and do it completely in generic code. Copy on read takes advantage of temporal locality. You wouldn't want to stream without copy on read because you decrease your idle I/O time by not effectively caching. >>> stream_4(): >>> increment offset >>> if more: >>> bdrv_aio_stream() >>> >>> >>> Of course, need to serialize wrt guest writes, which adds a bit more >>> complexity. I'll leave it to you to code the state machine for that. >> >> http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d >> > > Clever - it pushes all the synchronization into the copy-on-read > implementation. But the serialization there hardly jumps out of the > code. > > Do I understand correctly that you can only have one allocating read > or write running? Cluster allocation, L2 cache allocation, or on-disk L2 allocation? You only have one on-disk L2 allocation at one time. That's just an implementation detail at the moment. An on-disk L2 allocation happens only when writing to a new cluster that requires a totally new L2 entry. Since L2s cover 2GB of logical space, it's a rare event so this turns out to be pretty reasonable for a first implementation. Parallel on-disk L2 allocations is not that difficult, it's just a future TODO. >> >> Generally, I think the block layer makes more sense if the interface >> to the formats are high level and code sharing is achieved not by >> mandating a world view but rather but making libraries of common >> functionality. This is more akin to how the FS layer works in Linux. >> >> So IMHO, we ought to add a bdrv_aio_commit function, turn the current >> code into a generic_aio_commit, implement a qed_aio_commit, then >> somehow do qcow2_aio_commit, and look at what we can refactor into >> common code. > > What Linux does if have an equivalent of bdrv_generic_aio_commit() > which most implementations call (or default to), and only do something > if they want something special. Something like commit (or > copy-on-read, or copy-on-write, or streaming) can be implement 100% in > terms of the generic functions (and indeed qcow2 backing files can be > any format). Yes, what I'm really saying is that we should take the bdrv_generic_aio_commit() approach. I think we're in agreement here. Regards, Anthony Liguori