From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57989) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1elfdg-0007Gd-Om for qemu-devel@nongnu.org; Tue, 13 Feb 2018 13:49:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1elfdf-0006s5-2X for qemu-devel@nongnu.org; Tue, 13 Feb 2018 13:49:12 -0500 References: From: Eric Blake Message-ID: <1a0aa4be-adb5-43ab-03ba-ff18e941c460@redhat.com> Date: Tue, 13 Feb 2018 12:48:47 -0600 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] block_status automatically added flags List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy , qemu block , qemu-devel Cc: Max Reitz , Kevin Wolf , Paolo Bonzini On 02/13/2018 11:36 AM, Vladimir Sementsov-Ogievskiy wrote: > Hi Eric! >=20 > I'm now testing my nbd block status realization (block_status part, not= =20 > about dirty bitmaps), and faced into the following effect. >=20 > I created empty qcow2 image and wrote to the first sector, so >=20 > qemu-io -c map x >=20 > reports: >=20 > 64 KiB (0x10000) bytes=C2=A0=C2=A0=C2=A0=C2=A0 allocated at offset 0 by= tes (0x0) > 9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000) >=20 > But I can't get same results, when connecting to nbd server, exporting=20 > the same qcow2 file, I get >=20 > 10 MiB (0xa00000) bytes=C2=A0=C2=A0=C2=A0=C2=A0 allocated at offset 0 b= ytes (0x0) Is this with or without your NBD_CMD_BLOCK_STATUS patches applied? And=20 are you exposing the data over NBD as raw ('qemu-nbd -f qcow2'/'qemu-io=20 -f raw') or as qcow2 ('qemu-nbd -f raw'/'qemu-io -f qcow2')? /me does a quick reproduction Yes, I definitely see that behavior without any NBD_CMD_BLOCK_STATUS=20 patches and when the image is exposed over NBD as raw, but not when=20 exposed as qcow2, when testing the 2.11 release: $ qemu-img create -f qcow2 file3 10M Formatting 'file3', fmt=3Dqcow2 size=3D10485760 cluster_size=3D65536=20 lazy_refcounts=3Doff refcount_bits=3D16 $ qemu-io -c 'w 0 64k' -c map -f qcow2 file3 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0579 sec (1.079 MiB/sec and 17.2601 ops/sec) 64 KiB (0x10000) bytes allocated at offset 0 bytes (0x0) 9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000) $ qemu-nbd -f qcow2 -x foo file3 $ qemu-io -f raw -c map nbd://localhost:10809/foo 10 MiB (0xa00000) bytes allocated at offset 0 bytes (0x0) $ qemu-nbd -f raw -x foo file3 $ qemu-io -f qcow2 -c map nbd://localhost:10809/foo 64 KiB (0x10000) bytes allocated at offset 0 bytes (0x0) 9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000) Right now, without NBD block status, the NBD driver reports the entire=20 file as allocated, as it can't do any better (NBD has no backing file,=20 and all data . Presumably, once NBD_CMD_BLOCK_STATUS is implemented, we=20 can then use that for more accurate information. >=20 >=20 > Finally, I understand the reason: >=20 > for local file, qemu-io calls bdrv_is_allocated, which calls=20 > bdrv_common_block_status_above with want_zero=3Dfalse. So, it doesn't s= et=20 > BDRV_BLOCK_ZERO, and doesn't set BDRV_BLOCK_ALLOCATED. 'qemu-io map' is a bit unusual; it is the only UI that easily exposes=20 bdrv_is_allocated() to the outside world ('qemu-img map' does not).=20 (The fact that both operations are named 'map' but do something=20 different is annoying; for back-compat reasons, we can't change=20 qemu-img, and I don't know if changing qemu-io is worth it.) > And, even if we=20 > change want_zero to true, Well, you'd do that by invoking bdrv_block_status() (via 'qemu-img map',=20 for example). > here, it will set BDRV_BLOCK_ZERO, but will=20 > not set BDRV_BLOCK_ALLOCATED, which contradicts with it's definition: >=20 > =C2=A0BDRV_BLOCK_ALLOCATED: the content of the block is determined by = this > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 layer (short= for DATA || ZERO), set by block layer This text is wrong; it gets fixed in my still-pending concluding series=20 for byte-based block status: https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00955.html Conceptually, BDRV_BLOCK_ALLOCATED means "is THIS layer of the backing=20 chain responsible for the contents at this guest offset"; and there are=20 cases where we know that we read zeroes but where the current layer is=20 not responsible for the contents (such as a qcow2 that has a backing=20 file with shorter length, where we return BDRV_BLOCK_ZERO but not=20 BDRV_BLOCK_ALLOCATED). But since NBD has no backing chain, the entire=20 image is considered allocated. Meanwhile, asking whether something is=20 allocated ('qemu-io -c map') is not the usual question you want to ask=20 when determining what portions of a file are zero. >=20 >=20 > for nbd, we go through the similar way on server (but with want_zero =3D= =20 > true), and we finally have BDRV_BLOCK_ZERO without BDRV_BLOCK_ALLOCATED= ,=20 > which maps to NBD_STATE_HOLE+NBD_STATE_ZERO. But then, in the client we= =20 > have BDRV_BLOCK_ZERO not automatically added by block layer but directl= y=20 > from nbd driver, therefor BDRV_BLOCK_ALLOCATED is set and I get=20 > different result. Drivers should never set BDRV_BLOCK_ALLOCATED; only the code in io.c=20 should set it; and output based on BDRV_BLOCK_ALLOCATED is only useful=20 in backing chain scenarios (which NBD does not have). >=20 >=20 > this all looks weird for me. >=20 > BDRV_BLOCK_ALLOCATED definition should be fixed, to show that this flag= =20 > show only reported by driver flags, not automatically added. If we need to report yet more flags, where the driver can report=20 additional information, then that's different. But changing the=20 BDRV_BLOCK_ALLOCATED semantics would probably have knock-on effects that=20 I'm not prepared to audit for (that is, we'd rather fix the=20 documentation to match reality, which my pending patch does, and NOT=20 change the code to match the current incorrect documentation). >=20 > And then the situation with nbd looks strange, because automatically=20 > added flag becomes "flag, reported by block driver". >=20 > On the other hand, it is not wrong. >=20 >=20 > I think, to test the feature, I'll improve qemu-io map, to show zeros=20 > (or is it better to add separate command?) and leave the other things a= s=20 > is. What do you think? If you want to add a new qemu-io command that matches more what=20 'qemu-img map' does, be my guest. In fact, renaming the existing=20 'qemu-io map' to something else may be warranted (qemu-io has a much=20 smaller user base, and is not documented as being stable, so back-compat=20 is not an issue, as long as iotests still pass). But I do know that=20 existing iotests depend on the current 'qemu-io map' semantics of being=20 a thin veneer around bdrv_is_allocated(), and that we can't change those=20 semantics. So if we do anything, adding a new command may be smarter. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org