From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:60486) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UcAxD-0004L8-HI for qemu-devel@nongnu.org; Tue, 14 May 2013 04:51:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UcAx9-0001df-Pe for qemu-devel@nongnu.org; Tue, 14 May 2013 04:51:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15123) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UcAx9-0001dU-HP for qemu-devel@nongnu.org; Tue, 14 May 2013 04:51:23 -0400 Date: Tue, 14 May 2013 10:50:46 +0200 From: Kevin Wolf Message-ID: <20130514085046.GA2556@dhcp-200-207.str.redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wolfgang Richter Cc: Paolo Bonzini , qemu-devel , stefanha Am 13.05.2013 um 23:21 hat Wolfgang Richter geschrieben: > I'm working on a new patch series which will add a new QMP command, > block-trace, which turns on tracing of writes for a specified block dev= ice and > sends the stream unmodified to another block device. =A0The 'trace' is = meant to > be precise meaning that writes are not lost, which differentiates this = command > from others. =A0It can be turned on and off depending on when it is nee= ded. >=20 >=20 >=20 > How is this different from block-backup or drive-mirror? > -------------------------------------------------------- >=20 > block-backup is designed to create point-in-time snapshots and not clon= e the > entire write stream of a VM to a particular device. =A0It implements > copy-on-write to create a snapshot. =A0Thus whenever a write occurs, bl= ock-backup > is designed to send the original data and not the contents of the new w= rite. >=20 > drive-mirror is designed to mirror a disk to another location. =A0It op= erates by > periodically scanning a dirty bitmap and cloning blocks when dirtied. =A0= This is > efficient as it allows for batching of writes, but it does not maintain= the > order in which guest writes occurred and it can miss intermediate write= s when > they go to the same location on disk. Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. > How can block-trace be used? > ---------------------------- >=20 > (1) Disk introspection - systems which analyze the writes going to a di= sk for > introspection require a perfect clone of the write stream to an origina= l disk > to stay in-sync with updates to guest file systems. >=20 > (2) Replicated block device - two block devices could be maintained as = exact > copies of each other up to a point in the disk write stream that has > successfully been written to the destination block device. You're leaving out the most interesting section: How should block-trace be implemented? The first question is what the API should look like, on the QMP level. I think originally the idea was to use drive-mirror for all kinds of mirrors, but maybe it makes more sense indeed to keep the active mirror separate. I don't particularly like the name block-trace for a separate command, but let's save the bikeshedding for later. The other question is how to implement it internally. I don't think adding specific code for each new block job into bdrv_co_do_writev() is acceptable. We really need a generic way to intercept I/O operations. The keyword from earlier discussions is "block filters". Essentially the idea is that the block job temporarily adds a BlockDriverState on top of the format driver and becomes able to implement all callbacks it likes to intercept. The bad news is that the infrastructure isn't there yet to actually make this happen in a sane way. Kevin