Block alignment of qcow2 compress driver

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Block alignment of qcow2 compress driver
@ 2022-01-28 11:07 Richard W.M. Jones
  2022-01-28 11:39 ` Hanna Reitz
  2022-02-01 14:13 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 11:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: kwolf, andrey.shinkevich, hreitz, eblake

The commands below set up a sparse RAM disk, with an allocated block
at offset 32K and another one at offset 1M-32K.  Then it tries to copy
this to a compressed qcow2 file using qemu-nbd + the qemu compress
filter:

  $ qemu-img create -f qcow2 output.qcow2 1M
  $ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 & sleep 1
  $ nbdkit -U - \
           data '@32768 1*32768 @1015808 1*32768' \
           --run 'nbdcopy $uri nbd://localhost -p'

The nbdcopy command fails when zeroing the first 32K with:

  nbd://localhost: nbd_aio_zero: request is unaligned: Invalid argument

This is a bug in nbdcopy because it ignores the minimum block size
being correctly declared by the compress filter:

  $ nbdinfo nbd://localhost
  protocol: newstyle-fixed without TLS
  export="":
	export-size: 1048576 (1M)
	uri: nbd://localhost:10809/
	contexts:
  ...
		block_size_minimum: 65536          <----
		block_size_preferred: 65536
		block_size_maximum: 33554432

The compress filter sets the minimum block size to the the same as the
qcow2 cluster size here:

  https://gitlab.com/qemu-project/qemu/-/blob/cfe63e46be0a1f8a7fd2fd5547222f8344a43279/block/filter-compress.c#L117

I patched qemu to force this to 4K:

-    bs->bl.request_alignment = bdi.cluster_size;
+    //bs->bl.request_alignment = bdi.cluster_size;
+    bs->bl.request_alignment = 4096;

and the copy above works, and the output file is compressed!

So my question is, does the compress filter in qemu really need to
declare the large minimum block size?  I'm not especially concerned
about efficiency, I'd prefer it just worked, and changing nbdcopy to
understand block sizes is painful.

Is it already adjustable at run time?  (I tried using --image-opts
like compress.request_alignment=4096 but it seems like the filter
doesn't support anything I could think of, and I don't know how to
list the supported options.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:07 Block alignment of qcow2 compress driver Richard W.M. Jones
@ 2022-01-28 11:39 ` Hanna Reitz
  2022-01-28 11:48   ` Richard W.M. Jones
  2022-01-28 11:56   ` Richard W.M. Jones
  2022-02-01 14:13 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 14+ messages in thread
From: Hanna Reitz @ 2022-01-28 11:39 UTC (permalink / raw)
  To: Richard W.M. Jones, qemu-block, qemu-devel
  Cc: kwolf, andrey.shinkevich, eblake

On 28.01.22 12:07, Richard W.M. Jones wrote:
> The commands below set up a sparse RAM disk, with an allocated block
> at offset 32K and another one at offset 1M-32K.  Then it tries to copy
> this to a compressed qcow2 file using qemu-nbd + the qemu compress
> filter:
>
>    $ qemu-img create -f qcow2 output.qcow2 1M
>    $ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 & sleep 1
>    $ nbdkit -U - \
>             data '@32768 1*32768 @1015808 1*32768' \
>             --run 'nbdcopy $uri nbd://localhost -p'
>
> The nbdcopy command fails when zeroing the first 32K with:
>
>    nbd://localhost: nbd_aio_zero: request is unaligned: Invalid argument
>
> This is a bug in nbdcopy because it ignores the minimum block size
> being correctly declared by the compress filter:
>
>    $ nbdinfo nbd://localhost
>    protocol: newstyle-fixed without TLS
>    export="":
> 	export-size: 1048576 (1M)
> 	uri: nbd://localhost:10809/
> 	contexts:
>    ...
> 		block_size_minimum: 65536          <----
> 		block_size_preferred: 65536
> 		block_size_maximum: 33554432
>
> The compress filter sets the minimum block size to the the same as the
> qcow2 cluster size here:
>
>    https://gitlab.com/qemu-project/qemu/-/blob/cfe63e46be0a1f8a7fd2fd5547222f8344a43279/block/filter-compress.c#L117
>
> I patched qemu to force this to 4K:
>
> -    bs->bl.request_alignment = bdi.cluster_size;
> +    //bs->bl.request_alignment = bdi.cluster_size;
> +    bs->bl.request_alignment = 4096;
>
> and the copy above works, and the output file is compressed!
>
> So my question is, does the compress filter in qemu really need to
> declare the large minimum block size?  I'm not especially concerned
> about efficiency, I'd prefer it just worked, and changing nbdcopy to
> understand block sizes is painful.
>
> Is it already adjustable at run time?  (I tried using --image-opts
> like compress.request_alignment=4096 but it seems like the filter
> doesn't support anything I could think of, and I don't know how to
> list the supported options.)

I’m kind of amazed this works because the qcow2 driver rejects unaligned 
compressed writes[1].

And if I apply your diff, I can see that, too:
$ ./qemu-img create -f qcow2 test.qcow2 64M
Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=67108864 lazy_refcounts=off refcount_bits=16

$ ./qemu-io -c 'write 0 32k' --image-opts \
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
write failed: Invalid argument

So I actually don’t know why it works for you.  OTOH, I don’t understand 
why the block size affects you over NBD, because I would have expected 
qemu to internally auto-align requests when they are not aligned (in 
bdrv_co_pwritev_part()).  Like, when I set the NBD block driver’s 
alignment to 512[2], the following still succeeds:

$ ./qemu-img create -f qcow2 test.qcow2 64M
Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=67108864 lazy_refcounts=off refcount_bits=16

$ ./qemu-nbd --fork --image-opts \
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2

$ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
wrote 32768/32768 bytes at offset 0
32 KiB, 1 ops; 00.00 sec (9.034 MiB/sec and 289.0960 ops/sec)


So I wonder why your diff works on your end, and I also wonder whether 
we couldn’t just have the NBD server expose some custom alignment 
(because qemu should auto-align all requests the NBD server generates 
anyway).

Hanna

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/block/qcow2.c#L4662
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L1918



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:39 ` Hanna Reitz
@ 2022-01-28 11:48   ` Richard W.M. Jones
  2022-01-28 11:57     ` Hanna Reitz
  2022-01-28 11:56   ` Richard W.M. Jones
  1 sibling, 1 reply; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 11:48 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block

On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> So I actually don’t know why it works for you.  OTOH, I don’t
> understand why the block size affects you over NBD, because I would
> have expected qemu to internally auto-align requests when they are
> not aligned (in bdrv_co_pwritev_part()).

I checked it again and my hack definitely fixes nbdcopy.  But maybe
that's expected if qemu-nbd is auto-aligning requests?  (I'm only
accessing the block layer through qemu-nbd, not with qemu-io)

> Like, when I set the NBD
> block driver’s alignment to 512[2], the following still succeeds:

Did you just patch that line in the code or is there a qemu-nbd
option/image-opts to do this?

Rich.

> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/block/qcow2.c#L4662
> [2] https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L1918

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:48   ` Richard W.M. Jones
@ 2022-01-28 11:57     ` Hanna Reitz
  2022-01-28 12:18       ` Richard W.M. Jones
  0 siblings, 1 reply; 14+ messages in thread
From: Hanna Reitz @ 2022-01-28 11:57 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block

On 28.01.22 12:48, Richard W.M. Jones wrote:
> On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
>> So I actually don’t know why it works for you.  OTOH, I don’t
>> understand why the block size affects you over NBD, because I would
>> have expected qemu to internally auto-align requests when they are
>> not aligned (in bdrv_co_pwritev_part()).
> I checked it again and my hack definitely fixes nbdcopy.  But maybe
> that's expected if qemu-nbd is auto-aligning requests?  (I'm only
> accessing the block layer through qemu-nbd, not with qemu-io)

It’s not just qemu-io, with your diff[3] I get the same EINVAL over NBD, 
too:

$ ./qemu-img create -f qcow2 test.qcow2 64M
Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=67108864 lazy_refcounts=off refcount_bits=16

$ ./qemu-nbd --fork --image-opts \
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2

$ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
write failed: Invalid argument

>> Like, when I set the NBD
>> block driver’s alignment to 512[2], the following still succeeds:
> Did you just patch that line in the code or is there a qemu-nbd
> option/image-opts to do this?

I just changed that line of code [2], as shown in [4].  I suppose the 
better thing to do would be to have an option for the NBD server to 
force-change the announced request alignment, because it can expect the 
qemu block layer code to auto-align requests through RMW.  Doing it in 
the client is wrong, because the NBD server might want to detect that 
the client sends unaligned requests and reject them (though ours 
doesn’t, it just traces such events[5] – note that it’s explicitly noted 
there that qemu will auto-align requests).

Hanna

> Rich.
>
>> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/block/qcow2.c#L4662
>> [2] https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L1918

[3]
diff --git a/block/filter-compress.c b/block/filter-compress.c
index d5be538619..5a11d77231 100644
--- a/block/filter-compress.c
+++ b/block/filter-compress.c
@@ -114,7 +114,7 @@ static void compress_refresh_limits(BlockDriverState 
*bs, Error **errp)
          return;
      }

-    bs->bl.request_alignment = bdi.cluster_size;
+    bs->bl.request_alignment = 4096;
  }

[4]
diff --git a/block/nbd.c b/block/nbd.c
index 63dbfa807d..8608055800 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1915,7 +1915,7 @@ static void nbd_refresh_limits(BlockDriverState 
*bs, Error **errp)
                 s->info.base_allocation) ? 1 : BDRV_SECTOR_SIZE;
      }

-    bs->bl.request_alignment = min;
+    bs->bl.request_alignment = 512;
      bs->bl.max_pdiscard = QEMU_ALIGN_DOWN(INT_MAX, min);
      bs->bl.max_pwrite_zeroes = max;
      bs->bl.max_transfer = max;

[5] https://gitlab.com/qemu-project/qemu/-/blob/master/nbd/server.c#L2355



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:57     ` Hanna Reitz
@ 2022-01-28 12:18       ` Richard W.M. Jones
  2022-01-28 12:30         ` Hanna Reitz
  0 siblings, 1 reply; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 12:18 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block

On Fri, Jan 28, 2022 at 12:57:47PM +0100, Hanna Reitz wrote:
> On 28.01.22 12:48, Richard W.M. Jones wrote:
> >On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> >>So I actually don’t know why it works for you.  OTOH, I don’t
> >>understand why the block size affects you over NBD, because I would
> >>have expected qemu to internally auto-align requests when they are
> >>not aligned (in bdrv_co_pwritev_part()).
> >I checked it again and my hack definitely fixes nbdcopy.  But maybe
> >that's expected if qemu-nbd is auto-aligning requests?  (I'm only
> >accessing the block layer through qemu-nbd, not with qemu-io)
> 
> It’s not just qemu-io, with your diff[3] I get the same EINVAL over
> NBD, too:
> 
> $ ./qemu-img create -f qcow2 test.qcow2 64M
> Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536
> extended_l2=off compression_type=zlib size=67108864
> lazy_refcounts=off refcount_bits=16
> 
> $ ./qemu-nbd --fork --image-opts \
> driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
> 
> $ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
> write failed: Invalid argument

Strange - is that error being generated by qemu's nbd client code?

Here's my test not involving qemu's client code:

$ qemu-nbd --version
qemu-nbd 6.2.0 (qemu-6.2.0-2.fc36)

$ qemu-img create -f qcow2 output.qcow2 1M
Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16

$ qemu-nbd --fork --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2

$ nbdsh -u nbd://localhost
nbd> h.get_strict_mode()
31
nbd> h.set_strict_mode(31 & ~nbd.STRICT_ALIGN)
nbd> h.get_strict_mode()
15
nbd> h.pwrite(b'1'*1024, 0)
nbd> exit

So an unaligned 1K write works (after disabling libnbd's client-side
alignment checks).

> I just changed that line of code [2], as shown in [4].  I suppose
> the better thing to do would be to have an option for the NBD server
> to force-change the announced request alignment, because it can
> expect the qemu block layer code to auto-align requests through
> RMW.  Doing it in the client is wrong, because the NBD server might
> want to detect that the client sends unaligned requests and reject
> them (though ours doesn’t, it just traces such events[5] – note that
> it’s explicitly noted there that qemu will auto-align requests).

I know I said I didn't care about performance (in this case), but is
there in fact a penalty to sending unaligned requests to the qcow2
layer?  Or perhaps it cannot compress them?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 12:18       ` Richard W.M. Jones
@ 2022-01-28 12:30         ` Hanna Reitz
  2022-01-28 13:19           ` Kevin Wolf
  2022-01-28 13:30           ` Richard W.M. Jones
  0 siblings, 2 replies; 14+ messages in thread
From: Hanna Reitz @ 2022-01-28 12:30 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block

On 28.01.22 13:18, Richard W.M. Jones wrote:
> On Fri, Jan 28, 2022 at 12:57:47PM +0100, Hanna Reitz wrote:
>> On 28.01.22 12:48, Richard W.M. Jones wrote:
>>> On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
>>>> So I actually don’t know why it works for you.  OTOH, I don’t
>>>> understand why the block size affects you over NBD, because I would
>>>> have expected qemu to internally auto-align requests when they are
>>>> not aligned (in bdrv_co_pwritev_part()).
>>> I checked it again and my hack definitely fixes nbdcopy.  But maybe
>>> that's expected if qemu-nbd is auto-aligning requests?  (I'm only
>>> accessing the block layer through qemu-nbd, not with qemu-io)
>> It’s not just qemu-io, with your diff[3] I get the same EINVAL over
>> NBD, too:
>>
>> $ ./qemu-img create -f qcow2 test.qcow2 64M
>> Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536
>> extended_l2=off compression_type=zlib size=67108864
>> lazy_refcounts=off refcount_bits=16
>>
>> $ ./qemu-nbd --fork --image-opts \
>> driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
>>
>> $ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
>> write failed: Invalid argument
> Strange - is that error being generated by qemu's nbd client code?

It’s generated by qcow2, namely the exact place I pointed out (as [1]).  
I can see that when I put an fprintf there.

> Here's my test not involving qemu's client code:
>
> $ qemu-nbd --version
> qemu-nbd 6.2.0 (qemu-6.2.0-2.fc36)
>
> $ qemu-img create -f qcow2 output.qcow2 1M
> Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
>
> $ qemu-nbd --fork --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
>
> $ nbdsh -u nbd://localhost
> nbd> h.get_strict_mode()
> 31
> nbd> h.set_strict_mode(31 & ~nbd.STRICT_ALIGN)
> nbd> h.get_strict_mode()
> 15
> nbd> h.pwrite(b'1'*1024, 0)
> nbd> exit
>
> So an unaligned 1K write works (after disabling libnbd's client-side
> alignment checks).

It does work when having the NBD client side simply ignore the request 
alignment (my diff [4]), but it doesn’t work (for me) when only reducing 
the compress filter’s request alignment (my diff [3]).

>> I just changed that line of code [2], as shown in [4].  I suppose
>> the better thing to do would be to have an option for the NBD server
>> to force-change the announced request alignment, because it can
>> expect the qemu block layer code to auto-align requests through
>> RMW.  Doing it in the client is wrong, because the NBD server might
>> want to detect that the client sends unaligned requests and reject
>> them (though ours doesn’t, it just traces such events[5] – note that
>> it’s explicitly noted there that qemu will auto-align requests).
> I know I said I didn't care about performance (in this case), but is
> there in fact a penalty to sending unaligned requests to the qcow2
> layer?  Or perhaps it cannot compress them?

In qcow2, only the whole cluster can be compressed, so writing 
compressed data means having to write the whole cluster.  qcow2 could 
implement the padding by itself, but we decided to just leave the burden 
of only writing full clusters (with the COMPRESSED write flag) on the 
callers.

Things like qemu-img convert and blockdev-backup just adhere to that by 
design; and the compress driver makes sure to set its request alignment 
accordingly so that requests to it will always be aligned to the cluster 
size (either by its user, or by the qemu block layer which performs the 
padding automatically).

Hanna



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 12:30         ` Hanna Reitz
@ 2022-01-28 13:19           ` Kevin Wolf
  2022-01-28 13:36             ` Richard W.M. Jones
  2022-01-28 13:30           ` Richard W.M. Jones
  1 sibling, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2022-01-28 13:19 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: andrey.shinkevich, eblake, Richard W.M. Jones, qemu-block,
	qemu-devel

Am 28.01.2022 um 13:30 hat Hanna Reitz geschrieben:
> > > I just changed that line of code [2], as shown in [4].  I suppose
> > > the better thing to do would be to have an option for the NBD server
> > > to force-change the announced request alignment, because it can
> > > expect the qemu block layer code to auto-align requests through
> > > RMW.  Doing it in the client is wrong, because the NBD server might
> > > want to detect that the client sends unaligned requests and reject
> > > them (though ours doesn’t, it just traces such events[5] – note that
> > > it’s explicitly noted there that qemu will auto-align requests).
> > I know I said I didn't care about performance (in this case), but is
> > there in fact a penalty to sending unaligned requests to the qcow2
> > layer?  Or perhaps it cannot compress them?
> 
> In qcow2, only the whole cluster can be compressed, so writing compressed
> data means having to write the whole cluster.  qcow2 could implement the
> padding by itself, but we decided to just leave the burden of only writing
> full clusters (with the COMPRESSED write flag) on the callers.
> 
> Things like qemu-img convert and blockdev-backup just adhere to that by
> design; and the compress driver makes sure to set its request alignment
> accordingly so that requests to it will always be aligned to the cluster
> size (either by its user, or by the qemu block layer which performs the
> padding automatically).

I thought the more limiting factor would be that after auto-aligning the
first request by padding with zeros, the second request to the same
cluster would fail because compression doesn't allow using an already
allocated cluster:

    /* Compression can't overwrite anything. Fail if the cluster was already
     * allocated. */
    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
    if (cluster_offset & L2E_OFFSET_MASK) {
        qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
        return -EIO;
    }

Did you always just test a single request or why don't you run into
this?

I guess checking L2E_OFFSET_MASK is strictly speaking wrong because it's
invalid for compressed clusters (qcow2_get_cluster_type() feels more
appropriate), but in practice, you will always have non-zero data there,
so it should error out here.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 13:19           ` Kevin Wolf
@ 2022-01-28 13:36             ` Richard W.M. Jones
  0 siblings, 0 replies; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 13:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: andrey.shinkevich, Hanna Reitz, eblake, qemu-devel, qemu-block

On Fri, Jan 28, 2022 at 02:19:44PM +0100, Kevin Wolf wrote:
> Am 28.01.2022 um 13:30 hat Hanna Reitz geschrieben:
> > > > I just changed that line of code [2], as shown in [4].  I suppose
> > > > the better thing to do would be to have an option for the NBD server
> > > > to force-change the announced request alignment, because it can
> > > > expect the qemu block layer code to auto-align requests through
> > > > RMW.  Doing it in the client is wrong, because the NBD server might
> > > > want to detect that the client sends unaligned requests and reject
> > > > them (though ours doesn’t, it just traces such events[5] – note that
> > > > it’s explicitly noted there that qemu will auto-align requests).
> > > I know I said I didn't care about performance (in this case), but is
> > > there in fact a penalty to sending unaligned requests to the qcow2
> > > layer?  Or perhaps it cannot compress them?
> > 
> > In qcow2, only the whole cluster can be compressed, so writing compressed
> > data means having to write the whole cluster.  qcow2 could implement the
> > padding by itself, but we decided to just leave the burden of only writing
> > full clusters (with the COMPRESSED write flag) on the callers.
> > 
> > Things like qemu-img convert and blockdev-backup just adhere to that by
> > design; and the compress driver makes sure to set its request alignment
> > accordingly so that requests to it will always be aligned to the cluster
> > size (either by its user, or by the qemu block layer which performs the
> > padding automatically).
> 
> I thought the more limiting factor would be that after auto-aligning the
> first request by padding with zeros, the second request to the same
> cluster would fail because compression doesn't allow using an already
> allocated cluster:
> 
>     /* Compression can't overwrite anything. Fail if the cluster was already
>      * allocated. */
>     cluster_offset = get_l2_entry(s, l2_slice, l2_index);
>     if (cluster_offset & L2E_OFFSET_MASK) {
>         qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
>         return -EIO;
>     }
> 
> Did you always just test a single request or why don't you run into
> this?

I didn't test that one specifically and yes it does fail:

$ qemu-img create -f qcow2 output.qcow2 1M
Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
$ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 &
[1] 2069037

$ nbdsh -u nbd://localhost
nbd> h.set_strict_mode(h.get_strict_mode() & ~nbd.STRICT_ALIGN)
nbd> buf = b'1' * 1024
nbd> h.pwrite(buf, 0)
nbd> h.pwrite(buf, 1024)
Traceback (most recent call last):
  File "/usr/lib64/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/lib64/python3.10/site-packages/nbd.py", line 1631, in pwrite
    return libnbdmod.pwrite(self._o, buf, offset, flags)
nbd.Error: nbd_pwrite: write: command failed: Input/output error (EIO)

So what I said in the previous email about about minimum vs preferred
is wrong :-(

What's more interesting is that nbdcopy still appeared to work.
Simulating what that was doing would be something like which
also fails when I do it directly:

nbd> h.pwrite(buf, 0)
nbd> h.zero(1024, 1024)
Traceback (most recent call last):
  File "/usr/lib64/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/lib64/python3.10/site-packages/nbd.py", line 1782, in zero
    return libnbdmod.zero(self._o, count, offset, flags)
nbd.Error: nbd_zero: write-zeroes: command failed: Input/output error (EIO)

Anyway back to poking at nbdcopy to make it support block sizes ...

> I guess checking L2E_OFFSET_MASK is strictly speaking wrong because it's
> invalid for compressed clusters (qcow2_get_cluster_type() feels more
> appropriate), but in practice, you will always have non-zero data there,
> so it should error out here.
> 
> Kevin

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 12:30         ` Hanna Reitz
  2022-01-28 13:19           ` Kevin Wolf
@ 2022-01-28 13:30           ` Richard W.M. Jones
  2022-01-28 13:37             ` Richard W.M. Jones
  2022-01-28 21:22             ` Eric Blake
  1 sibling, 2 replies; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 13:30 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block


On Fri, Jan 28, 2022 at 01:30:43PM +0100, Hanna Reitz wrote:
> On 28.01.22 13:18, Richard W.M. Jones wrote:
> >On Fri, Jan 28, 2022 at 12:57:47PM +0100, Hanna Reitz wrote:
> >>On 28.01.22 12:48, Richard W.M. Jones wrote:
> >>>On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> >>>>So I actually don’t know why it works for you.  OTOH, I don’t
> >>>>understand why the block size affects you over NBD, because I would
> >>>>have expected qemu to internally auto-align requests when they are
> >>>>not aligned (in bdrv_co_pwritev_part()).
> >>>I checked it again and my hack definitely fixes nbdcopy.  But maybe
> >>>that's expected if qemu-nbd is auto-aligning requests?  (I'm only
> >>>accessing the block layer through qemu-nbd, not with qemu-io)
> >>It’s not just qemu-io, with your diff[3] I get the same EINVAL over
> >>NBD, too:
> >>
> >>$ ./qemu-img create -f qcow2 test.qcow2 64M
> >>Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536
> >>extended_l2=off compression_type=zlib size=67108864
> >>lazy_refcounts=off refcount_bits=16
> >>
> >>$ ./qemu-nbd --fork --image-opts \
> >>driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
> >>
> >>$ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
> >>write failed: Invalid argument
> >Strange - is that error being generated by qemu's nbd client code?
> 
> It’s generated by qcow2, namely the exact place I pointed out (as
> [1]).  I can see that when I put an fprintf there.

I can't reproduce this behaviour (with qemu @ cfe63e46be0a, the head
of git at time of writing).  I wonder if I'm doing something wrong?

  ++ /home/rjones/d/qemu/build/qemu-img create -f qcow2 output.qcow2 64k
  Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=65536 lazy_refcounts=off refcount_bits=16
  ++ sleep 1
  ++ /home/rjones/d/qemu/build/qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
  ++ /home/rjones/d/qemu/build/qemu-io -c 'write 0 32k' -f raw nbd://localhost
  wrote 32768/32768 bytes at offset 0
  32 KiB, 1 ops; 00.02 sec (1.547 MiB/sec and 49.5067 ops/sec)

> >I know I said I didn't care about performance (in this case), but is
> >there in fact a penalty to sending unaligned requests to the qcow2
> >layer?  Or perhaps it cannot compress them?
> 
> In qcow2, only the whole cluster can be compressed, so writing
> compressed data means having to write the whole cluster.  qcow2
> could implement the padding by itself, but we decided to just leave
> the burden of only writing full clusters (with the COMPRESSED write
> flag) on the callers.

I feel like this may be a bug in what qemu-nbd advertises.  Currently
it is:

$ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 &
[2] 2068900
$ nbdinfo nbd://localhost
protocol: newstyle-fixed without TLS
export="":
	export-size: 65536 (64K)
	uri: nbd://localhost:10809/
	contexts:
		base:allocation
		is_rotational: false
		is_read_only: false
		can_cache: true
		can_df: true
		can_fast_zero: true
		can_flush: true
		can_fua: true
		can_multi_conn: false
		can_trim: true
		can_zero: true
		block_size_minimum: 65536    <---
		block_size_preferred: 65536
		block_size_maximum: 33554432

block_size_preferred is (rightly) set to 64K, as that's what the
compress + qcow2 combination prefers.

But block_size_minimum sounds as if it should be 512 or 1, if qemu-nbd
is able to reassemble smaller than preferred requests, even if they
are suboptimal.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 13:30           ` Richard W.M. Jones
@ 2022-01-28 13:37             ` Richard W.M. Jones
  2022-01-28 21:22             ` Eric Blake
  1 sibling, 0 replies; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 13:37 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block

On Fri, Jan 28, 2022 at 01:30:53PM +0000, Richard W.M. Jones wrote:
> I feel like this may be a bug in what qemu-nbd advertises.  Currently
> it is:

Ignore this email, see other reply.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 13:30           ` Richard W.M. Jones
  2022-01-28 13:37             ` Richard W.M. Jones
@ 2022-01-28 21:22             ` Eric Blake
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Blake @ 2022-01-28 21:22 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: kwolf, andrey.shinkevich, Hanna Reitz, qemu-devel, qemu-block

On Fri, Jan 28, 2022 at 01:30:53PM +0000, Richard W.M. Jones wrote:
> > 
> > In qcow2, only the whole cluster can be compressed, so writing
> > compressed data means having to write the whole cluster.  qcow2
> > could implement the padding by itself, but we decided to just leave
> > the burden of only writing full clusters (with the COMPRESSED write
> > flag) on the callers.
> 
> I feel like this may be a bug in what qemu-nbd advertises.  Currently
> it is:
> 
> $ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 &
> [2] 2068900
> $ nbdinfo nbd://localhost

> 		block_size_minimum: 65536    <---
> 		block_size_preferred: 65536
> 		block_size_maximum: 33554432
> 
> block_size_preferred is (rightly) set to 64K, as that's what the
> compress + qcow2 combination prefers.
> 
> But block_size_minimum sounds as if it should be 512 or 1, if qemu-nbd
> is able to reassemble smaller than preferred requests, even if they
> are suboptimal.

When compression is involved, 64k is the minimum block size at the
qcow2 layer, but the qemu NBD layer is relying on the generic block
core code to do RMW on anything smaller than that.  If the RMW doesn't
work, we may have a bug in the block layer.  Even if it does appear to
work, I'm not sure whether the block layer is able to recompress a
cluster - it may be that the act of RMW on a partially-written
initially-compressed cluster causes that cluster to no longer be
compressed, at which point, while your write succeeded, you are no
longer getting any compression.

So, while it is a nice QoI feature of qemu-nbd that we can rely on the
block layer RMW to accept client requests that were smaller than the
advertised minimum block size, I still think the advertised size is
correct, and that the client is in violation of the spec if it is
requesting but then not honoring the advertised size.  And yes, while
it is a pain to hack nbdcopy to pay more attention to block sizing, I
think in the long run it will be worth it.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:39 ` Hanna Reitz
  2022-01-28 11:48   ` Richard W.M. Jones
@ 2022-01-28 11:56   ` Richard W.M. Jones
  2022-01-28 21:40     ` Eric Blake
  1 sibling, 1 reply; 14+ messages in thread
From: Richard W.M. Jones @ 2022-01-28 11:56 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: kwolf, andrey.shinkevich, eblake, qemu-devel, qemu-block


I hacked nbdcopy to ignore block alignment (the error actually comes
from libnbd refusing to send the unaligned request, not from
qemu-nbd), and indeed qemu-nbd accepts the unaligned request without
complaint.

Eric - maybe having some flag for nbdcopy to ignore unaligned requests
when we know the server doesn't care (qemu-nbd) would work?

Rich.

--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -59,6 +59,10 @@ open_one_nbd_handle (struct rw_nbd *rwn)
     exit (EXIT_FAILURE);
   }
 
+  uint32_t sm = nbd_get_strict_mode (nbd);
+  sm &= ~LIBNBD_STRICT_ALIGN;
+  nbd_set_strict_mode (nbd, sm);
+
   nbd_set_debug (nbd, verbose);
 
   if (extents && rwn->d == READING &&


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:56   ` Richard W.M. Jones
@ 2022-01-28 21:40     ` Eric Blake
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2022-01-28 21:40 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: kwolf, qemu-block, qemu-devel, Hanna Reitz, andrey.shinkevich,
	libguestfs

Adding libnbd list (libguestfs) in cc

On Fri, Jan 28, 2022 at 11:56:19AM +0000, Richard W.M. Jones wrote:
> 
> I hacked nbdcopy to ignore block alignment (the error actually comes
> from libnbd refusing to send the unaligned request, not from
> qemu-nbd), and indeed qemu-nbd accepts the unaligned request without
> complaint.

And only after I already replied to your other email, did I then see
your followup recommending to read this one instead ;)

The NBD spec says the client is non-complying when sending under-sized
requests.  If the server accepts it anyway (presumably with RMW
performance pessimizations, as qemu-nbd does), that's a QoI bonus.

> 
> Eric - maybe having some flag for nbdcopy to ignore unaligned requests
> when we know the server doesn't care (qemu-nbd) would work?

Yeah, that might make sense - a command-line option for stating "I
know the server has a nice QoI feature, and I don't mind the
performance pessimization".

Another thing to consider: the way the NBD spec is written, the rules
about a client sending unaligned requests being non-compliant only
applies to a client that requested block size information in the first
place.  If the client did not request block alignment information, the
server should honor anything at alignment of 512 or above (even if it
would prefer a larger minimum); performance may suffer, but this is
needed to cater to older clients that don't know how to request
alignments - and what's more, qemu-nbd specifically has code that
changes what it advertises if the client did not query (that is, an
advertisement of 64k is ONLY possible if the client requested
alignment details).

So maybe the question becomes whether libnbd needs a knob on whether
to request alignment information.  Libnbd commit 9e9c74755 (libnbd
v1.3.10) is where I added the code to unconditionally query for
alignment info.  Given that we now know of a case where NOT querying
causes qemu-nbd to behave differently in what it advertises, adding
such a knob to the API makes total sense, at which point, 'nbdcopy
--ignore-align' becomes a way to request that the client not request
alignment in the first place, rather than as a way to call
nbd_set_strict_mode() to turn off alignment checking.

Looks like I have some API work do propose in libnbd...

> 
> Rich.
> 
> --- a/copy/nbd-ops.c
> +++ b/copy/nbd-ops.c
> @@ -59,6 +59,10 @@ open_one_nbd_handle (struct rw_nbd *rwn)
>      exit (EXIT_FAILURE);
>    }
>  
> +  uint32_t sm = nbd_get_strict_mode (nbd);
> +  sm &= ~LIBNBD_STRICT_ALIGN;
> +  nbd_set_strict_mode (nbd, sm);
> +
>    nbd_set_debug (nbd, verbose);
>  
>    if (extents && rwn->d == READING &&
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block alignment of qcow2 compress driver
  2022-01-28 11:07 Block alignment of qcow2 compress driver Richard W.M. Jones
  2022-01-28 11:39 ` Hanna Reitz
@ 2022-02-01 14:13 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-01 14:13 UTC (permalink / raw)
  To: Richard W.M. Jones, qemu-block, qemu-devel
  Cc: kwolf, andrey.shinkevich, hreitz, eblake

28.01.2022 14:07, Richard W.M. Jones wrote:
> The commands below set up a sparse RAM disk, with an allocated block
> at offset 32K and another one at offset 1M-32K.  Then it tries to copy
> this to a compressed qcow2 file using qemu-nbd + the qemu compress
> filter:
> 
>    $ qemu-img create -f qcow2 output.qcow2 1M
>    $ qemu-nbd -t --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2 & sleep 1
>    $ nbdkit -U - \
>             data '@32768 1*32768 @1015808 1*32768' \
>             --run 'nbdcopy $uri nbd://localhost -p'
> 
> The nbdcopy command fails when zeroing the first 32K with:
> 
>    nbd://localhost: nbd_aio_zero: request is unaligned: Invalid argument
> 
> This is a bug in nbdcopy because it ignores the minimum block size
> being correctly declared by the compress filter:
> 
>    $ nbdinfo nbd://localhost
>    protocol: newstyle-fixed without TLS
>    export="":
> 	export-size: 1048576 (1M)
> 	uri: nbd://localhost:10809/
> 	contexts:
>    ...
> 		block_size_minimum: 65536          <----
> 		block_size_preferred: 65536
> 		block_size_maximum: 33554432
> 
> The compress filter sets the minimum block size to the the same as the
> qcow2 cluster size here:
> 
>    https://gitlab.com/qemu-project/qemu/-/blob/cfe63e46be0a1f8a7fd2fd5547222f8344a43279/block/filter-compress.c#L117
> 
> I patched qemu to force this to 4K:
> 
> -    bs->bl.request_alignment = bdi.cluster_size;
> +    //bs->bl.request_alignment = bdi.cluster_size;
> +    bs->bl.request_alignment = 4096;
> 
> and the copy above works, and the output file is compressed!
> 
> So my question is, does the compress filter in qemu really need to
> declare the large minimum block size?  I'm not especially concerned
> about efficiency, I'd prefer it just worked, and changing nbdcopy to
> understand block sizes is painful.
> 
> Is it already adjustable at run time?  (I tried using --image-opts
> like compress.request_alignment=4096 but it seems like the filter
> doesn't support anything I could think of, and I don't know how to
> list the supported options.)
> 


Hi!

I didn't read the whole thread, so in case it was not mentioned:

There is a limitation about compressed writes in qcow2 driver in Qemu: you can't do compressed write to the same cluster twice, the second write will fail. So, when we do some copying (or backup) and write cluster by cluster (or at least cluster-aligend) everything works. If you write partial cluster (with compress filter) it may work due to automatic RMW, but when you than try to write second part of same cluster it will fail.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-01 16:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-28 11:07 Block alignment of qcow2 compress driver Richard W.M. Jones
2022-01-28 11:39 ` Hanna Reitz
2022-01-28 11:48   ` Richard W.M. Jones
2022-01-28 11:57     ` Hanna Reitz
2022-01-28 12:18       ` Richard W.M. Jones
2022-01-28 12:30         ` Hanna Reitz
2022-01-28 13:19           ` Kevin Wolf
2022-01-28 13:36             ` Richard W.M. Jones
2022-01-28 13:30           ` Richard W.M. Jones
2022-01-28 13:37             ` Richard W.M. Jones
2022-01-28 21:22             ` Eric Blake
2022-01-28 11:56   ` Richard W.M. Jones
2022-01-28 21:40     ` Eric Blake
2022-02-01 14:13 ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).