* qemu-img convert vs writing another copy tool
@ 2020-01-23 18:35 Richard W.M. Jones
2020-01-23 18:53 ` Max Reitz
2020-01-23 19:21 ` Eric Blake
0 siblings, 2 replies; 7+ messages in thread
From: Richard W.M. Jones @ 2020-01-23 18:35 UTC (permalink / raw)
To: qemu-devel, qemu-block, mreitz, eblake, berrange, mkletzan,
ptoscano
Cc: marnold
I guess some people are aware that virt-v2v, which is a tool which
converts guests from VMware to run on KVM, and some other
OpenStack-OpenStack migration tools we have, use "qemu-img convert" to
copy the data around.
Historically we've had bugs here. The most recent was discussed in
the thread on this list called "Bug? qemu-img convert to preallocated
image makes it sparse"
(https://www.mail-archive.com/qemu-block@nongnu.org/msg60479.html)
We've been kicking around the idea of writing some alternate tool. My
proposal would be a tool (not yet written, maybe it will never be
written) called nbdcp for copying between NBD servers and local files.
An outline manual page for this proposed tool is attached.
Some of the things which this tool might do which qemu-img convert
cannot do right now:
- Hint that the target already contains zeroes. It's almost always
the case that we know this, but we cannot tell qemu. This was the
cause of a big performance regression last year.
- Declare that we want the target to be either sparse or
preallocated. qemu-img convert can sort of do this in a
round-about way (create the target in advance and use the -n
option), but also it's broken at the moment.
- NBD multi-conn. In my tests this makes a really massive
performance difference in certain situations. Again, virt-v2v has
a lot of information that we cannot pass to qemu: we know, for
example, exactly if the server supports the feature, how many
threads are available, in some situations even have information
about the network and backing disks that the data will travel over
/ be stored on.
- Machine-parsable progress bars. You can, sort of, parse the
progress bar from qemu-img convert, but it's not as easy as it
could be. In particular it would be nice if the format was treated
as ABI, and if there was a way to have the tool write the progress
bar info to a precreated file descriptor.
- External block lists. This is a rather obscure requirement, but
it's necessary in the case where we can get the allocated block map
from another source (eg. pyvmomi) and then want to use that with an
NBD source that does not support extents (eg. nbdkit-ssh-plugin /
libssh / sftp). [Having said that, it may be possible to implement
this as an nbdkit filter, so maybe this is not a blocking feature.]
One thing which qemu-img convert can do which nbdcp could not:
- Read or write from qcow2 files.
So instead of splitting the ecosystem and writing a new tool that
doesn't do as much as qemu-img convert, I wonder what qemu developers
think about the above missing features? For example, are they in
scope for qemu-img convert?
Rich.
----------------------------------------------------------------------
nbdcp(1) LIBNBD nbdcp(1)
NAME
nbdcp - copy between NBD servers and local files
SYNOPSIS
nbdcp [-a|--target-allocation allocated|sparse]
[-b|--block-list <blocksfile>]
[-m|--multi-conn <n>] [-M|--multi-conn-target <n>]
[-p|--progress-bar] [-S|--sparse-detect <n>]
[-T|--threads <n>] [-z|--target-is-zero]
'nbd://...'|DISK.IMG 'nbd://...'|DISK.IMG
DESCRIPTION
nbdcp is a utility that can copy quickly between NBD servers and local
raw format files (or block devices). It can copy:
from NBD server to file (or block device)
For example, this command copies from the NBD server listening on
port 10809 on "example.com" to a local file called disk.img:
nbdcp nbd://example.com disk.img
from file (or block device) to NBD server
For example, this command copies from a local block device /dev/sda
to the NBD server listening on Unix domain socket /tmp/socket:
nbdcp /dev/sda 'nbd+unix:///?socket=/tmp/socket'
from NBD server to NBD server
For example this copies between two different exports on the same
NBD server:
nbdcp nbd://example.com/export1 nbd://example.com/export2
This program cannot: copy from file to file (use cp(1) or dd(1)), copy
to or from formats other than raw (use qemu-img(1) convert), or access
servers other than NBD servers (also use qemu-img(1)).
NBD servers are specified by their URI, following the NBD URI standard
at https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
Controlling sparseness or preallocation in the target
The options -a (--target-allocation), -S (--sparse-detect) and -z
(--target-is-zero) together control sparseness in the target file.
By default nbdcp tries to both preserve sparseness from the source and
will detect runs of allocated zeroes and turn them into sparseness. To
turn off detection of sparseness use "-S 0".
The -z option should be used if and only if you know that the target
block device is zeroed already. This allows an important optimization
where nbdcp can skip zeroing or trimming parts of the disk that are
already zero.
The -a option is used to control the desired final preallocation state
of the target. The default is "-a sparse" which makes the target as
sparse as possible. "-a allocated" makes the target fully allocated.
OPTIONS
--help
Display brief command line help and exit.
-a allocated
--target-allocation=allocated
Make the target fully allocated.
-a sparse
--target-allocation=sparse
Make the target as sparse as possible. This is the default. See
also "Controlling sparseness or preallocation in the target".
-b BLOCKSFILE
--block-list=BLOCKSFILE
Load the list of extents from an external file. nbdcp considers
this to be the truth for source extents. The file should contain
one record per line in the same format as nbdkit-sh-plugin(1), ie:
offset length type
with "offset" and "length" in bytes, and the "type" field being a
comma-separated list of the words "hole" and "zero". For example:
0 1M
1M 9M hole,zero
Any parts of the source which don't have descriptions are assumed
to be of type "hole,zero".
-m N
--multi-conn=N
Enable NBD multi-conn with up to "N" connections. Only some NBD
servers support this but it can greatly improve performance.
The default is to enable multi-conn if we detect that the server
supports it, with up to 4 connections.
-M N
--multi-conn-target=N
If you are copying between NBD servers, use -m to control the
multi-conn setting for the source server, and this option (-M) to
control the multi-conn setting for the target server.
-p
--progress-bar
Display a progress bar during copying.
-p machine:FD
--progress-bar=machine:FD
Write a machine-readable progress bar to file descriptor "FD".
This progress bar prints lines with the format "COPIED/TOTAL"
(where "COPIED" and "TOTAL" are 64 bit unsigned integers).
-S 0
--sparse-detect=0
Turn off sparseness detection.
-S N
--sparse-detect=N
Detect runs of zero bytes of at least size "N" bytes and turn them
into sparse blocks on the target (if "-a sparse" is used). This is
the default, with a 512 byte block size.
-T N
--threads N
Use at most "N" threads when copying. Usually more threads leads
to better performance, up to the limit of the number of cores on
your machine and the parallelism of the underlying disk or network.
The default is to use the number of online processors.
-z
--target-is-zero
Declare that the target block device contains only zero bytes (or
sparseness that reads back as zeroes). You must only use this
option if you are sure that this is true, since it means that nbdcp
will enable an optimization where it skips zeroing parts of the
disk that are zero on the source.
-V
--version
Display the package name and version and exit.
SEE ALSO
qemu-img(1), libnbd(3), nbdsh(1).
AUTHORS
Richard W.M. Jones
COPYRIGHT
Copyright (C) 2020 Red Hat Inc.
LICENSE
This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA
libnbd-1.3.1 2020-01-23 nbdcp(1)
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-23 18:35 qemu-img convert vs writing another copy tool Richard W.M. Jones
@ 2020-01-23 18:53 ` Max Reitz
2020-01-23 19:17 ` Richard W.M. Jones
2020-01-23 19:21 ` Eric Blake
1 sibling, 1 reply; 7+ messages in thread
From: Max Reitz @ 2020-01-23 18:53 UTC (permalink / raw)
To: Richard W.M. Jones, qemu-devel, qemu-block, eblake, berrange,
mkletzan, ptoscano
Cc: marnold
[-- Attachment #1.1: Type: text/plain, Size: 5231 bytes --]
On 23.01.20 19:35, Richard W.M. Jones wrote:
> I guess some people are aware that virt-v2v, which is a tool which
> converts guests from VMware to run on KVM, and some other
> OpenStack-OpenStack migration tools we have, use "qemu-img convert" to
> copy the data around.
>
> Historically we've had bugs here. The most recent was discussed in
> the thread on this list called "Bug? qemu-img convert to preallocated
> image makes it sparse"
> (https://www.mail-archive.com/qemu-block@nongnu.org/msg60479.html)
>
> We've been kicking around the idea of writing some alternate tool. My
> proposal would be a tool (not yet written, maybe it will never be
> written) called nbdcp for copying between NBD servers and local files.
> An outline manual page for this proposed tool is attached.
>
> Some of the things which this tool might do which qemu-img convert
> cannot do right now:
>
> - Hint that the target already contains zeroes. It's almost always
> the case that we know this, but we cannot tell qemu. This was the
> cause of a big performance regression last year.
>
> - Declare that we want the target to be either sparse or
> preallocated. qemu-img convert can sort of do this in a
> round-about way (create the target in advance and use the -n
> option), but also it's broken at the moment.
Both of these would be solved by --target-is-zero, I think.
> - NBD multi-conn. In my tests this makes a really massive
> performance difference in certain situations. Again, virt-v2v has
> a lot of information that we cannot pass to qemu: we know, for
> example, exactly if the server supports the feature, how many
> threads are available, in some situations even have information
> about the network and backing disks that the data will travel over
> / be stored on.
As far as I understand it, you use qemu-img convert with an NBD source
or target, too?
I suppose it’s always easier to let a specialized and freshly written
tool handle such information. But it sounds like if such information is
useful and makes that big of a difference, then it would be good to be
able to specify it to qemu’s NBD block driver, too.
> - Machine-parsable progress bars. You can, sort of, parse the
> progress bar from qemu-img convert, but it's not as easy as it
> could be. In particular it would be nice if the format was treated
> as ABI, and if there was a way to have the tool write the progress
> bar info to a precreated file descriptor.
It doesn’t seem impossible to add this feature to qemu-img, although I
wonder about the interface. I suppose we could make it an alternative
progress output mode (with some command-line flag), and then the
information would be emitted to stdout (just like the existing progress
report). You can of course redirect stdout to whatever fd you’d like,
so I don’t know whether qemu-img itself needs that specific capability.
OTOH, if you need this feature, why not just use qemu itself? That is,
a mirror or a backup block job in an otherwise empty VM.
> - External block lists. This is a rather obscure requirement, but
> it's necessary in the case where we can get the allocated block map
> from another source (eg. pyvmomi) and then want to use that with an
> NBD source that does not support extents (eg. nbdkit-ssh-plugin /
> libssh / sftp). [Having said that, it may be possible to implement
> this as an nbdkit filter, so maybe this is not a blocking feature.]
That too seems like a feature that’s easily implementable in a
specialized tool, but hard to implement in qemu-img.
I suppose we’d want a dirty bitmap copy mode which copies only the
regions that the bitmap reports as dirty – but at that point you’re
probably again better off not using qemu-img, but qemu itself. Then
we’d need some way to import bitmaps, and I actually don’t think we have
that yet.
But again, if this is a generally useful feature, I think we want it in
qemu anyway.
> One thing which qemu-img convert can do which nbdcp could not:
>
> - Read or write from qcow2 files.
>
> So instead of splitting the ecosystem and writing a new tool that
> doesn't do as much as qemu-img convert, I wonder what qemu developers
> think about the above missing features? For example, are they in
> scope for qemu-img convert?
What I think is that there may be features that we don’t want in
qemu-img, because they are more appropriate for the mirror or backup
block job. For example, I don’t think we want to let qemu-img convert
mess around with dirty bitmaps.
But apart from that, the features you propose all seem useful to have in
qemu itself. Maybe some of them are too hard to implement (specifically
importing bitmaps from external sources), then it might be pragmatic to
write a new tool where such features can be easily implemented because
they don’t need to be integrated into an existing API.
As for performance, well, if qemu’s NBD driver is slow, then naively I’d
think that’s a bug, isn’t it? And if improving performance requires
knobs, then that’s how it is.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-23 18:53 ` Max Reitz
@ 2020-01-23 19:17 ` Richard W.M. Jones
2020-01-24 5:45 ` Markus Armbruster
0 siblings, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2020-01-23 19:17 UTC (permalink / raw)
To: Max Reitz; +Cc: berrange, qemu-block, qemu-devel, ptoscano, marnold, mkletzan
On Thu, Jan 23, 2020 at 07:53:57PM +0100, Max Reitz wrote:
> On 23.01.20 19:35, Richard W.M. Jones wrote:
> > - NBD multi-conn. In my tests this makes a really massive
> > performance difference in certain situations. Again, virt-v2v has
> > a lot of information that we cannot pass to qemu: we know, for
> > example, exactly if the server supports the feature, how many
> > threads are available, in some situations even have information
> > about the network and backing disks that the data will travel over
> > / be stored on.
>
> As far as I understand it, you use qemu-img convert with an NBD source
> or target, too?
Virt-v2v has many modes, but yes generally there will be either an NBD
source & target, or an NBD source to a local file target.
> I suppose it’s always easier to let a specialized and freshly written
> tool handle such information. But it sounds like if such information is
> useful and makes that big of a difference, then it would be good to be
> able to specify it to qemu’s NBD block driver, too.
qemu-img convert has worked really well for us, and I'm actually _not_
confident that I could do better with a specialized tool. But there's
definitely more info we could pass, such as the amount of parallelism
we believe is available in the NBD server / processors / disks.
> > - Machine-parsable progress bars. You can, sort of, parse the
> > progress bar from qemu-img convert, but it's not as easy as it
> > could be. In particular it would be nice if the format was treated
> > as ABI, and if there was a way to have the tool write the progress
> > bar info to a precreated file descriptor.
>
> It doesn’t seem impossible to add this feature to qemu-img, although I
> wonder about the interface. I suppose we could make it an alternative
> progress output mode (with some command-line flag), and then the
> information would be emitted to stdout (just like the existing progress
> report). You can of course redirect stdout to whatever fd you’d like,
> so I don’t know whether qemu-img itself needs that specific capability.
>
> OTOH, if you need this feature, why not just use qemu itself? That is,
> a mirror or a backup block job in an otherwise empty VM.
I don't think we've really thought before about this approach. Maybe
the launching of a VM (even an empty / stopped one) could be a
problem. I guess this is what the new tool that was recently proposed
upstream might help with? (Was it called qemu-block-storage? I can't
find it right this minute)
> > - External block lists. This is a rather obscure requirement, but
> > it's necessary in the case where we can get the allocated block map
> > from another source (eg. pyvmomi) and then want to use that with an
> > NBD source that does not support extents (eg. nbdkit-ssh-plugin /
> > libssh / sftp). [Having said that, it may be possible to implement
> > this as an nbdkit filter, so maybe this is not a blocking feature.]
>
> That too seems like a feature that’s easily implementable in a
> specialized tool, but hard to implement in qemu-img.
>
> I suppose we’d want a dirty bitmap copy mode which copies only the
> regions that the bitmap reports as dirty – but at that point you’re
> probably again better off not using qemu-img, but qemu itself. Then
> we’d need some way to import bitmaps, and I actually don’t think we have
> that yet.
>
> But again, if this is a generally useful feature, I think we want it in
> qemu anyway.
I think this is actually one we can more easily implement as an nbdkit
filter. I'm going to try this and see.
> > One thing which qemu-img convert can do which nbdcp could not:
> >
> > - Read or write from qcow2 files.
> >
> > So instead of splitting the ecosystem and writing a new tool that
> > doesn't do as much as qemu-img convert, I wonder what qemu developers
> > think about the above missing features? For example, are they in
> > scope for qemu-img convert?
>
> What I think is that there may be features that we don’t want in
> qemu-img, because they are more appropriate for the mirror or backup
> block job. For example, I don’t think we want to let qemu-img convert
> mess around with dirty bitmaps.
>
> But apart from that, the features you propose all seem useful to have in
> qemu itself. Maybe some of them are too hard to implement (specifically
> importing bitmaps from external sources), then it might be pragmatic to
> write a new tool where such features can be easily implemented because
> they don’t need to be integrated into an existing API.
>
> As for performance, well, if qemu’s NBD driver is slow, then naively I’d
> think that’s a bug, isn’t it? And if improving performance requires
> knobs, then that’s how it is.
Thanks,
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-23 18:35 qemu-img convert vs writing another copy tool Richard W.M. Jones
2020-01-23 18:53 ` Max Reitz
@ 2020-01-23 19:21 ` Eric Blake
2020-01-24 9:55 ` Richard W.M. Jones
1 sibling, 1 reply; 7+ messages in thread
From: Eric Blake @ 2020-01-23 19:21 UTC (permalink / raw)
To: Richard W.M. Jones, qemu-devel, qemu-block, mreitz, berrange,
mkletzan, ptoscano
Cc: marnold
On 1/23/20 12:35 PM, Richard W.M. Jones wrote:
> I guess some people are aware that virt-v2v, which is a tool which
> converts guests from VMware to run on KVM, and some other
> OpenStack-OpenStack migration tools we have, use "qemu-img convert" to
> copy the data around.
>
> Historically we've had bugs here. The most recent was discussed in
> the thread on this list called "Bug? qemu-img convert to preallocated
> image makes it sparse"
> (https://www.mail-archive.com/qemu-block@nongnu.org/msg60479.html)
>
> We've been kicking around the idea of writing some alternate tool. My
> proposal would be a tool (not yet written, maybe it will never be
> written) called nbdcp for copying between NBD servers and local files.
> An outline manual page for this proposed tool is attached.
>
> Some of the things which this tool might do which qemu-img convert
> cannot do right now:
>
> - Hint that the target already contains zeroes. It's almost always
> the case that we know this, but we cannot tell qemu. This was the
> cause of a big performance regression last year.
This has just recently been proposed:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg03617.html
I'm also working on a patch that I will post soon that extends the NBD
protocol to advertise this information (it will help the situation where
the destination is NBD, but as that requires a new enough server to
advertise the information, having the feature as a command-line option
allows the same speedup even without the server supporting the extension).
>
> - Declare that we want the target to be either sparse or
> preallocated. qemu-img convert can sort of do this in a
> round-about way (create the target in advance and use the -n
> option), but also it's broken at the moment.
>
> - NBD multi-conn. In my tests this makes a really massive
> performance difference in certain situations. Again, virt-v2v has
> a lot of information that we cannot pass to qemu: we know, for
> example, exactly if the server supports the feature, how many
> threads are available, in some situations even have information
> about the network and backing disks that the data will travel over
> / be stored on.
Multi-conn for reading the source allows better parallelism. Multi-conn
for writing is a bit trickier - it should be safe if the different
connections are only touching distinct segments of the export (no
overlaps), but as qemu does not advertise multiconn in such situations,
you may still need a command-line switch to force multiple writers in
spite of the server not advertising it. Here, I'm not aware of anyone
with patches underway, but I also think it would be a good ground for
exploring.
>
> - Machine-parsable progress bars. You can, sort of, parse the
> progress bar from qemu-img convert, but it's not as easy as it
> could be. In particular it would be nice if the format was treated
> as ABI, and if there was a way to have the tool write the progress
> bar info to a precreated file descriptor.
Would be nice, but I'm not aware of anyone currently planning to add it.
>
> - External block lists. This is a rather obscure requirement, but
> it's necessary in the case where we can get the allocated block map
> from another source (eg. pyvmomi) and then want to use that with an
> NBD source that does not support extents (eg. nbdkit-ssh-plugin /
> libssh / sftp). [Having said that, it may be possible to implement
> this as an nbdkit filter, so maybe this is not a blocking feature.]
How are you intending to use this? I'm guessing you have some way of
feeding in information to qemu-img of which portions of the source image
you want to copy, and ignore remaining portions.
Note that it IS already possible to use qemu's copy-on-read feature as a
way to copy only a subset of a source file over to a destination file.
When demonstrating incremental backup, I wrote this shell function:
copyif() {
if test $# -lt 2 || test $# -gt 3; then
echo 'usage: copyif src dst [bitmap]'
return 1
fi
if test -z "$3"; then
map_from="-f raw nbd://localhost:10809/$1"
state=true
else
map_from="--image-opts driver=nbd,export=$1,server.type=inet"
map_from+=",server.host=localhost,server.port=10809"
map_from+=",x-dirty-bitmap=qemu:dirty-bitmap:$3"
state=false
fi
$qemu_img info -f raw nbd://localhost:10809/$1 || return
$qemu_img info -f qcow2 $2 || return
ret=0
$qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/$1 $2
while read line; do
[[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.* ]]
|| continue
start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
echo
echo " $start $len:"
qemu-io -C -c "r $start $len" -f qcow2 $2
done < <($qemu_img map --output=json $map_from)
$qemu_img rebase -u -f qcow2 -b '' $2
if test $ret = 0; then echo 'Success!'; fi
return $ret
}
The key lines here are 'qemu-io -C -c "r $start $len" -f qcow2 $2',
which is performed in a loop to read just targetted portions of the
destination qcow2 file with copy-on-read set to pull in that portion
from its backing file, and '<($qemu_img map --output=json $map_from)'
which was used to derive the extent map driving which portions of the
file to read.
We also have 'qemu-img dd' that can copy subsets of a file, although it
is not currently the ideal interface, and probably needs to be enhanced
(I have a branch where I had tried working on patches for it, but where
the feedback was that we want the improvements to be more generic, or
even teach 'qemu-img convert' to support offsets the way 'qemu-img dd'
tries to; I'd need to revisit that branch...)
>
> One thing which qemu-img convert can do which nbdcp could not:
>
> - Read or write from qcow2 files.
Although you could still couple things together: nbdcp for new features
plus qemu-nbd to drive an NBD wrapper around qcow2 (as source or as
destination).
>
> So instead of splitting the ecosystem and writing a new tool that
> doesn't do as much as qemu-img convert, I wonder what qemu developers
> think about the above missing features? For example, are they in
> scope for qemu-img convert?
>
I could see all of these being viable additions to qemu-img, but also
wonder if writing nbdcp would get those features available in a faster
manner.
>
> SYNOPSIS
> nbdcp [-a|--target-allocation allocated|sparse]
> [-b|--block-list <blocksfile>]
These make sense for any qemu-img format.
> [-m|--multi-conn <n>] [-M|--multi-conn-target <n>]
These might make more sense as tunables for how to set up NBD client
(destination) or server (source), rather than directly as qemu-img
options. That is, I could imagine that we'd use qemu-img
--image-format, and then expose new blockdev-style knobs for setting up
the NBD endpoint to enable multiconn usage of that endpoint.
> [-p|--progress-bar] [-S|--sparse-detect <n>]
> [-T|--threads <n>] [-z|--target-is-zero]
> 'nbd://...'|DISK.IMG 'nbd://...'|DISK.IMG
And these options also seem like they are useful to qemu-img proper.
>
> This program cannot: copy from file to file (use cp(1) or dd(1)), copy
> to or from formats other than raw (use qemu-img(1) convert), or access
> servers other than NBD servers (also use qemu-img(1)).
Again, depending on how we want to mix-and-match things, using qemu-nbd
to create the NBD endpoint for the nbdcp source or destination may be
worthwhile (which is different than directly using qemu-img); we'd want
some decent examples of building such chains between tools. Or it could
help us decide whether we can cut out some overhead by consolidating
typical uses into one tool rather than requiring convoluted chains.
>
> -b BLOCKSFILE
> --block-list=BLOCKSFILE
> Load the list of extents from an external file. nbdcp considers
> this to be the truth for source extents. The file should contain
> one record per line in the same format as nbdkit-sh-plugin(1), ie:
>
> offset length type
>
> with "offset" and "length" in bytes, and the "type" field being a
> comma-separated list of the words "hole" and "zero". For example:
>
> 0 1M
> 1M 9M hole,zero
Could we also teach this to parse 'qemu-img map --output=json' format?
And/or add 'qemu-img map --output=XYZ' (different from the current
--output=human') that gives sufficient information? (Note:
--output=human is NOT suitable for extent lists - it intentionally
outputs only the data portions, and in so doing coalesces 'hole' and
'hole,zero' segments to be indistinguishable).
>
> -p
> --progress-bar
> Display a progress bar during copying.
>
> -p machine:FD
> --progress-bar=machine:FD
> Write a machine-readable progress bar to file descriptor "FD".
> This progress bar prints lines with the format "COPIED/TOTAL"
> (where "COPIED" and "TOTAL" are 64 bit unsigned integers).
Supporting optional arguments to long options is okay, but supporting
optional arguments to short options gets tricky when using getopt. I
would recommend two separate options, '-p' with no argument as shorthand
for progress to stderr, and '-P description' with mandatory option for
where to send progress, rather than trying to let '-p' have optional
argument.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-23 19:17 ` Richard W.M. Jones
@ 2020-01-24 5:45 ` Markus Armbruster
0 siblings, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2020-01-24 5:45 UTC (permalink / raw)
To: Richard W.M. Jones
Cc: berrange, qemu-block, qemu-devel, ptoscano, mkletzan, marnold,
Max Reitz
"Richard W.M. Jones" <rjones@redhat.com> writes:
> On Thu, Jan 23, 2020 at 07:53:57PM +0100, Max Reitz wrote:
>> On 23.01.20 19:35, Richard W.M. Jones wrote:
>> > - NBD multi-conn. In my tests this makes a really massive
>> > performance difference in certain situations. Again, virt-v2v has
>> > a lot of information that we cannot pass to qemu: we know, for
>> > example, exactly if the server supports the feature, how many
>> > threads are available, in some situations even have information
>> > about the network and backing disks that the data will travel over
>> > / be stored on.
>>
>> As far as I understand it, you use qemu-img convert with an NBD source
>> or target, too?
>
> Virt-v2v has many modes, but yes generally there will be either an NBD
> source & target, or an NBD source to a local file target.
>
>> I suppose it’s always easier to let a specialized and freshly written
>> tool handle such information. But it sounds like if such information is
>> useful and makes that big of a difference, then it would be good to be
>> able to specify it to qemu’s NBD block driver, too.
>
> qemu-img convert has worked really well for us, and I'm actually _not_
> confident that I could do better with a specialized tool. But there's
> definitely more info we could pass, such as the amount of parallelism
> we believe is available in the NBD server / processors / disks.
>
>> > - Machine-parsable progress bars. You can, sort of, parse the
>> > progress bar from qemu-img convert, but it's not as easy as it
>> > could be. In particular it would be nice if the format was treated
>> > as ABI, and if there was a way to have the tool write the progress
>> > bar info to a precreated file descriptor.
>>
>> It doesn’t seem impossible to add this feature to qemu-img, although I
>> wonder about the interface. I suppose we could make it an alternative
>> progress output mode (with some command-line flag), and then the
>> information would be emitted to stdout (just like the existing progress
>> report). You can of course redirect stdout to whatever fd you’d like,
>> so I don’t know whether qemu-img itself needs that specific capability.
>>
>> OTOH, if you need this feature, why not just use qemu itself? That is,
>> a mirror or a backup block job in an otherwise empty VM.
>
> I don't think we've really thought before about this approach. Maybe
> the launching of a VM (even an empty / stopped one) could be a
> problem. I guess this is what the new tool that was recently proposed
> upstream might help with? (Was it called qemu-block-storage? I can't
> find it right this minute)
Subject: [RFC PATCH 00/18] Add qemu-storage-daemon
To: qemu-block@nongnu.org
Date: Thu, 17 Oct 2019 15:01:46 +0200
Message-Id: <20191017130204.16131-1-kwolf@redhat.com>
[...]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-23 19:21 ` Eric Blake
@ 2020-01-24 9:55 ` Richard W.M. Jones
2020-01-24 13:49 ` Richard W.M. Jones
0 siblings, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2020-01-24 9:55 UTC (permalink / raw)
To: Eric Blake
Cc: berrange, qemu-block, qemu-devel, ptoscano, marnold, mkletzan,
mreitz
On Thu, Jan 23, 2020 at 01:21:28PM -0600, Eric Blake wrote:
> On 1/23/20 12:35 PM, Richard W.M. Jones wrote:
> > - Hint that the target already contains zeroes. It's almost always
> > the case that we know this, but we cannot tell qemu. This was the
> > cause of a big performance regression last year.
>
> This has just recently been proposed:
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg03617.html
Oh indeed, this is good.
> > - NBD multi-conn. In my tests this makes a really massive
> > performance difference in certain situations. Again, virt-v2v has
> > a lot of information that we cannot pass to qemu: we know, for
> > example, exactly if the server supports the feature, how many
> > threads are available, in some situations even have information
> > about the network and backing disks that the data will travel over
> > / be stored on.
>
> Multi-conn for reading the source allows better parallelism.
> Multi-conn for writing is a bit trickier - it should be safe if the
> different connections are only touching distinct segments of the
> export (no overlaps), but as qemu does not advertise multiconn in
> such situations, you may still need a command-line switch to force
> multiple writers in spite of the server not advertising it. Here,
> I'm not aware of anyone with patches underway, but I also think it
> would be a good ground for exploring.
But in the qemu-img convert case specifically, multi-conn should
be safe for writing?
One additional problem with multi-conn is that NBD servers only
advertise that the feature is present, not the best possible degree of
parallelism to use. (It's possible that the server cannot or doesn't
know this.)
> > - External block lists. This is a rather obscure requirement, but
> > it's necessary in the case where we can get the allocated block map
> > from another source (eg. pyvmomi) and then want to use that with an
> > NBD source that does not support extents (eg. nbdkit-ssh-plugin /
> > libssh / sftp). [Having said that, it may be possible to implement
> > this as an nbdkit filter, so maybe this is not a blocking feature.]
>
> How are you intending to use this? I'm guessing you have some way of
> feeding in information to qemu-img of which portions of the source
> image you want to copy, and ignore remaining portions.
I should say first that I've nearly finished an nbdkit filter
implementation of this, so feel free to ignore this for qemu.
The background to this feature is that some block device backends do
not have support for determining extents / disk block allocation
status. The one that is most frequently used is ssh (sftp). Note
that adding this support to sftp, while possible, doesn't really solve
the problem because the proprietary hypervisors we are pulling from
don't use recent SSH servers.
So copying from SSH is slow because you have no choice except to read
vast amounts of zeroes or deleted data. (This doesn't affect virt-v2v
because it has another strategy to avoid this, but it does affect
other scenarios such as "warm" conversions and any migration that
doesn't involve using virt-v2v.)
However you can get the extent information by other means. For VMware
you can use VMOMI to read this. Or you can ssh in and run commands
like xfs_bmap.
So in theory at least it's possible to assemble the required data
from multiple sources and thus avoid wasteful copying.
With nbdkit you'll be able to do something like:
# fetch the extents list over VMOMI > extents.txt, then
nbdkit -U /tmp/sock --filter=extentlist ssh \
host=server /vmfs/.../file-flat.vmdk \
extentlist=extents.txt
qemu-img convert nbd:unix:/tmp/sock ...
> Note that it IS already possible to use qemu's copy-on-read feature
> as a way to copy only a subset of a source file over to a
> destination file. When demonstrating incremental backup, I wrote
> this shell function:
>
> copyif() {
> if test $# -lt 2 || test $# -gt 3; then
> echo 'usage: copyif src dst [bitmap]'
> return 1
> fi
> if test -z "$3"; then
> map_from="-f raw nbd://localhost:10809/$1"
> state=true
> else
> map_from="--image-opts driver=nbd,export=$1,server.type=inet"
> map_from+=",server.host=localhost,server.port=10809"
> map_from+=",x-dirty-bitmap=qemu:dirty-bitmap:$3"
> state=false
> fi
> $qemu_img info -f raw nbd://localhost:10809/$1 || return
> $qemu_img info -f qcow2 $2 || return
> ret=0
> $qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/$1 $2
> while read line; do
> [[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.*
> ]] || continue
> start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
> echo
> echo " $start $len:"
> qemu-io -C -c "r $start $len" -f qcow2 $2
> done < <($qemu_img map --output=json $map_from)
> $qemu_img rebase -u -f qcow2 -b '' $2
> if test $ret = 0; then echo 'Success!'; fi
> return $ret
> }
>
> The key lines here are 'qemu-io -C -c "r $start $len" -f qcow2 $2',
> which is performed in a loop to read just targetted portions of the
> destination qcow2 file with copy-on-read set to pull in that portion
> from its backing file, and '<($qemu_img map --output=json
> $map_from)' which was used to derive the extent map driving which
> portions of the file to read.
>
> We also have 'qemu-img dd' that can copy subsets of a file, although
> it is not currently the ideal interface, and probably needs to be
> enhanced (I have a branch where I had tried working on patches for
> it, but where the feedback was that we want the improvements to be
> more generic, or even teach 'qemu-img convert' to support offsets
> the way 'qemu-img dd' tries to; I'd need to revisit that branch...)
>
> >
> >One thing which qemu-img convert can do which nbdcp could not:
> >
> > - Read or write from qcow2 files.
>
> Although you could still couple things together: nbdcp for new
> features plus qemu-nbd to drive an NBD wrapper around qcow2 (as
> source or as destination).
>
> >
> >So instead of splitting the ecosystem and writing a new tool that
> >doesn't do as much as qemu-img convert, I wonder what qemu developers
> >think about the above missing features? For example, are they in
> >scope for qemu-img convert?
> >
>
> I could see all of these being viable additions to qemu-img, but
> also wonder if writing nbdcp would get those features available in a
> faster manner.
>
>
> >
> >SYNOPSIS
> > nbdcp [-a|--target-allocation allocated|sparse]
> > [-b|--block-list <blocksfile>]
>
> These make sense for any qemu-img format.
>
> > [-m|--multi-conn <n>] [-M|--multi-conn-target <n>]
>
> These might make more sense as tunables for how to set up NBD client
> (destination) or server (source), rather than directly as qemu-img
> options. That is, I could imagine that we'd use qemu-img
> --image-format, and then expose new blockdev-style knobs for setting
> up the NBD endpoint to enable multiconn usage of that endpoint.
Yes this makes sense.
> > [-p|--progress-bar] [-S|--sparse-detect <n>]
> > [-T|--threads <n>] [-z|--target-is-zero]
> > 'nbd://...'|DISK.IMG 'nbd://...'|DISK.IMG
>
> And these options also seem like they are useful to qemu-img proper.
>
> >
> > This program cannot: copy from file to file (use cp(1) or dd(1)), copy
> > to or from formats other than raw (use qemu-img(1) convert), or access
> > servers other than NBD servers (also use qemu-img(1)).
>
> Again, depending on how we want to mix-and-match things, using
> qemu-nbd to create the NBD endpoint for the nbdcp source or
> destination may be worthwhile (which is different than directly
> using qemu-img); we'd want some decent examples of building such
> chains between tools. Or it could help us decide whether we can cut
> out some overhead by consolidating typical uses into one tool rather
> than requiring convoluted chains.
>
>
> >
> > -b BLOCKSFILE
> > --block-list=BLOCKSFILE
> > Load the list of extents from an external file. nbdcp considers
> > this to be the truth for source extents. The file should contain
> > one record per line in the same format as nbdkit-sh-plugin(1), ie:
> >
> > offset length type
> >
> > with "offset" and "length" in bytes, and the "type" field being a
> > comma-separated list of the words "hole" and "zero". For example:
> >
> > 0 1M
> > 1M 9M hole,zero
>
> Could we also teach this to parse 'qemu-img map --output=json'
> format? And/or add 'qemu-img map --output=XYZ' (different from the
> current --output=human') that gives sufficient information? (Note:
> --output=human is NOT suitable for extent lists - it intentionally
> outputs only the data portions, and in so doing coalesces 'hole' and
> 'hole,zero' segments to be indistinguishable).
If qemu-img doesn't have the data (we have to get it from
another source), is the output of qemu-img map relevant?
Rich.
> >
> > -p
> > --progress-bar
> > Display a progress bar during copying.
> >
> > -p machine:FD
> > --progress-bar=machine:FD
> > Write a machine-readable progress bar to file descriptor "FD".
> > This progress bar prints lines with the format "COPIED/TOTAL"
> > (where "COPIED" and "TOTAL" are 64 bit unsigned integers).
>
> Supporting optional arguments to long options is okay, but
> supporting optional arguments to short options gets tricky when
> using getopt. I would recommend two separate options, '-p' with no
> argument as shorthand for progress to stderr, and '-P description'
> with mandatory option for where to send progress, rather than trying
> to let '-p' have optional argument.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc. +1-919-301-3226
> Virtualization: qemu.org | libvirt.org
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-img convert vs writing another copy tool
2020-01-24 9:55 ` Richard W.M. Jones
@ 2020-01-24 13:49 ` Richard W.M. Jones
0 siblings, 0 replies; 7+ messages in thread
From: Richard W.M. Jones @ 2020-01-24 13:49 UTC (permalink / raw)
To: Eric Blake
Cc: berrange, qemu-block, qemu-devel, ptoscano, marnold, mkletzan,
mreitz
On Fri, Jan 24, 2020 at 09:55:55AM +0000, Richard W.M. Jones wrote:
> On Thu, Jan 23, 2020 at 01:21:28PM -0600, Eric Blake wrote:
> > Could we also teach this to parse 'qemu-img map --output=json'
> > format? And/or add 'qemu-img map --output=XYZ' (different from the
> > current --output=human') that gives sufficient information? (Note:
> > --output=human is NOT suitable for extent lists - it intentionally
> > outputs only the data portions, and in so doing coalesces 'hole' and
> > 'hole,zero' segments to be indistinguishable).
>
> If qemu-img doesn't have the data (we have to get it from
> another source), is the output of qemu-img map relevant?
I can see that we might use this to transfer a map from one qemu
source to another, which could be useful. Unfortunately nbdkit
doesn't link to any libraries that can read JSON at the moment :-(
But certainly something to keep in mind for the future.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-01-24 13:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-23 18:35 qemu-img convert vs writing another copy tool Richard W.M. Jones
2020-01-23 18:53 ` Max Reitz
2020-01-23 19:17 ` Richard W.M. Jones
2020-01-24 5:45 ` Markus Armbruster
2020-01-23 19:21 ` Eric Blake
2020-01-24 9:55 ` Richard W.M. Jones
2020-01-24 13:49 ` Richard W.M. Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).