* Clarification regarding new qemu-img convert --target-is-zero flag @ 2020-06-10 5:28 Sam Eiderman 2020-06-10 6:16 ` Vladimir Sementsov-Ogievskiy 2020-06-10 11:56 ` David Edmondson 0 siblings, 2 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 5:28 UTC (permalink / raw) To: qemu-block, qemu-devel, david.edmondson Cc: vsementsov, eblake, Max Reitz, Tony Zhang Hi, 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a nice functionality for cloud scenarios: * Create a virtual disk * Convert a sparse image (qcow2, vmdk) to the virtual disk using --target-is-zero * Use the virtual disk This saves many unnecessary writes - a qcow2 with 1MB of allocated data but with 100GB virtual size will be converted efficiently. However, does this pose a problem if the virtual disk is not zero initialized? Theoretically - if all unallocated blocks contain garbage - this shouldn't matter, however what about allocated blocks of zero? Will convert skip copying allocated zero blocks in the source image to the target since it assumes that the target is zeroed out first thing? Sam ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 5:28 Clarification regarding new qemu-img convert --target-is-zero flag Sam Eiderman @ 2020-06-10 6:16 ` Vladimir Sementsov-Ogievskiy 2020-06-10 6:28 ` Sam Eiderman 2020-06-10 11:56 ` David Edmondson 1 sibling, 1 reply; 19+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-06-10 6:16 UTC (permalink / raw) To: Sam Eiderman, qemu-block, qemu-devel, david.edmondson Cc: Tony Zhang, Max Reitz Hi Sam! 10.06.2020 08:28, Sam Eiderman wrote: > Hi, > > 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a > nice functionality for cloud scenarios: > > * Create a virtual disk What is the format of your target? > * Convert a sparse image (qcow2, vmdk) to the virtual disk using > --target-is-zero > * Use the virtual disk > > This saves many unnecessary writes - a qcow2 with 1MB of allocated > data but with 100GB virtual size will be converted efficiently. > > However, does this pose a problem if the virtual disk is not zero initialized? > > Theoretically - if all unallocated blocks contain garbage - this > shouldn't matter, however what about allocated blocks of zero? Will > convert skip copying allocated zero blocks in the source image to the > target since it assumes that the target is zeroed out first thing? > Yes, the feature is only for really zero-initialized target, it will skip "allocated" zeroes as well. What you want - copying only allocated blocks of backing-supporting format - looks like "top" mode of mirror and backup block jobs. Didn't you considered using qemu itself (in stopped mode, i.e. cpus are not running) or new qemu-storage-daemon instead of qemu-img? With this approach you'll have the whole power of QMP commands to manage block-layer, including block-jobs. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 6:16 ` Vladimir Sementsov-Ogievskiy @ 2020-06-10 6:28 ` Sam Eiderman 2020-06-10 11:37 ` Kevin Wolf 0 siblings, 1 reply; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 6:28 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: qemu-block, qemu-devel, david.edmondson, eblake, Max Reitz, Tony Zhang Hi, My target format is a Persistent Disk on GCP. https://cloud.google.com/persistent-disk And my use case is converting VMDKs to PDs so I'm just using qemu-img for the conversion (not using qemu as a hypervisor). Luckily PDs are zeroed out when allocated but I was asking to understand the restrictions of qemu-img convert. It could be useful for qemu-img convert to not zero out the disk, but do write allocated zeroes, I'm imagining cloud scenarios where instead of virtual disks the customer receives an attached physical SSD device that is not zeroed out beforehand (only encryption key changed, for privacy/security sake) so reads will return garbage. Sam On Wed, Jun 10, 2020 at 9:16 AM Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > Hi Sam! > > 10.06.2020 08:28, Sam Eiderman wrote: > > Hi, > > > > 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a > > nice functionality for cloud scenarios: > > > > * Create a virtual disk > > What is the format of your target? > > > * Convert a sparse image (qcow2, vmdk) to the virtual disk using > > --target-is-zero > > * Use the virtual disk > > > > This saves many unnecessary writes - a qcow2 with 1MB of allocated > > data but with 100GB virtual size will be converted efficiently. > > > > However, does this pose a problem if the virtual disk is not zero initialized? > > > > Theoretically - if all unallocated blocks contain garbage - this > > shouldn't matter, however what about allocated blocks of zero? Will > > convert skip copying allocated zero blocks in the source image to the > > target since it assumes that the target is zeroed out first thing? > > > > Yes, the feature is only for really zero-initialized target, it will skip "allocated" zeroes as well. > > What you want - copying only allocated blocks of backing-supporting format - looks like "top" mode of > mirror and backup block jobs. Didn't you considered using qemu itself (in stopped mode, i.e. cpus are > not running) or new qemu-storage-daemon instead of qemu-img? With this approach you'll have the whole > power of QMP commands to manage block-layer, including block-jobs. > > -- > Best regards, > Vladimir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 6:28 ` Sam Eiderman @ 2020-06-10 11:37 ` Kevin Wolf 2020-06-10 11:52 ` Sam Eiderman 0 siblings, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2020-06-10 11:37 UTC (permalink / raw) To: Sam Eiderman Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz, david.edmondson, Tony Zhang Am 10.06.2020 um 08:28 hat Sam Eiderman geschrieben: > Hi, > > My target format is a Persistent Disk on GCP. > https://cloud.google.com/persistent-disk > > And my use case is converting VMDKs to PDs so I'm just using qemu-img > for the conversion (not using qemu as a hypervisor). > > Luckily PDs are zeroed out when allocated but I was asking to > understand the restrictions of qemu-img convert. > > It could be useful for qemu-img convert to not zero out the disk, but > do write allocated zeroes, I'm imagining cloud scenarios where instead > of virtual disks the customer receives an attached physical SSD device > that is not zeroed out beforehand (only encryption key changed, for > privacy/security sake) so reads will return garbage. But that's the default mode? Zeroing out the whole disk upfront is an optimisation that we do if efficient zeroing is possible, but if we can't, we just write explicit zeros where needed. --target-is-zero means that you promise that the target is already pre-zeroed so qemu-img can further optimise things. If you specify it and the target doesn't contain zeros, but random data, you get garbage. Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 11:37 ` Kevin Wolf @ 2020-06-10 11:52 ` Sam Eiderman 0 siblings, 0 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 11:52 UTC (permalink / raw) To: Kevin Wolf Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, david.edmondson, eblake, Max Reitz, Tony Zhang I see, I thought qemu-img (by default) checks the virtual size of the disk before starting to copy allocated data, zeroes out all of the virtual size (slowly) and then writes all the allocated data except for zeroes. But from what I understand now, qemu-img finds that the target is raw and can not be efficiently zeroed, so it just writes all the allocated data, including zeroes, leaving unallocated gaps in the virtual size unwritten. I have an image of 800MB VMDK with virtual size of 24GB So if the following: qemu-img convert "${IMAGE_PATH}" -p -O raw -S 512b /dev/sdc 2>&1 Takes roughly 3 minutes and 40 seconds (qemu 3.1.0) And: qemu-img convert "${IMAGE_PATH}" -n --target-is-zero -p -O raw /dev/sdc 2>&1 Takes roughly 2 seconds (qemu 5.0.0) This means that probably there are ~23GB of zeroes *allocated* in this VMDK, I'll check that. Sam On Wed, Jun 10, 2020 at 2:37 PM Kevin Wolf <kwolf@redhat.com> wrote: > > Am 10.06.2020 um 08:28 hat Sam Eiderman geschrieben: > > Hi, > > > > My target format is a Persistent Disk on GCP. > > https://cloud.google.com/persistent-disk > > > > And my use case is converting VMDKs to PDs so I'm just using qemu-img > > for the conversion (not using qemu as a hypervisor). > > > > Luckily PDs are zeroed out when allocated but I was asking to > > understand the restrictions of qemu-img convert. > > > > It could be useful for qemu-img convert to not zero out the disk, but > > do write allocated zeroes, I'm imagining cloud scenarios where instead > > of virtual disks the customer receives an attached physical SSD device > > that is not zeroed out beforehand (only encryption key changed, for > > privacy/security sake) so reads will return garbage. > > But that's the default mode? Zeroing out the whole disk upfront is an > optimisation that we do if efficient zeroing is possible, but if we > can't, we just write explicit zeros where needed. > > --target-is-zero means that you promise that the target is already > pre-zeroed so qemu-img can further optimise things. If you specify it > and the target doesn't contain zeros, but random data, you get garbage. > > Kevin > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 5:28 Clarification regarding new qemu-img convert --target-is-zero flag Sam Eiderman 2020-06-10 6:16 ` Vladimir Sementsov-Ogievskiy @ 2020-06-10 11:56 ` David Edmondson 2020-06-10 12:19 ` Sam Eiderman 1 sibling, 1 reply; 19+ messages in thread From: David Edmondson @ 2020-06-10 11:56 UTC (permalink / raw) To: Sam Eiderman, qemu-block, qemu-devel; +Cc: vsementsov, Tony Zhang, Max Reitz On Wednesday, 2020-06-10 at 08:28:29 +03, Sam Eiderman wrote: > Hi, > > 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a > nice functionality for cloud scenarios: > > * Create a virtual disk > * Convert a sparse image (qcow2, vmdk) to the virtual disk using > --target-is-zero > * Use the virtual disk > > This saves many unnecessary writes - a qcow2 with 1MB of allocated > data but with 100GB virtual size will be converted efficiently. > > However, does this pose a problem if the virtual disk is not zero initialized? As Vladimir indicated, the intent of the flag is supposed to be clear from the name :-) If your storage doesn't read zeroes absent any earlier writes, you probably don't want to be using it. > Theoretically - if all unallocated blocks contain garbage - this > shouldn't matter, however what about allocated blocks of zero? Will > convert skip copying allocated zero blocks in the source image to the > target since it assumes that the target is zeroed out first thing? So something like a "--no-need-to-zero" flag would do what you want, presuming that it would write known zeroes but no longer clean the device before use? dme. -- You can't hide from the flipside. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 11:56 ` David Edmondson @ 2020-06-10 12:19 ` Sam Eiderman 2020-06-10 13:36 ` Vladimir Sementsov-Ogievskiy 2020-06-10 14:06 ` Kevin Wolf 0 siblings, 2 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 12:19 UTC (permalink / raw) To: David Edmondson Cc: qemu-block, qemu-devel, Vladimir Sementsov-Ogievskiy, eblake, Max Reitz, Tony Zhang Thanks David, Yes, I imaging the following use case: disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. /dev/sda is a raw disk containing garbage I invoke: qemu-img convert disk.vmdk -O raw /dev/sda Required output: The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img finishes fast. Kevin, from what I understood from you, this is the default behavior. So if my VMDK is causing trouble (all virtual size is being written) this is probably since all the grains in the VMDK are zero allocated right? Thanks! On Wed, Jun 10, 2020 at 2:56 PM David Edmondson <dme@dme.org> wrote: > > On Wednesday, 2020-06-10 at 08:28:29 +03, Sam Eiderman wrote: > > > Hi, > > > > 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a > > nice functionality for cloud scenarios: > > > > * Create a virtual disk > > * Convert a sparse image (qcow2, vmdk) to the virtual disk using > > --target-is-zero > > * Use the virtual disk > > > > This saves many unnecessary writes - a qcow2 with 1MB of allocated > > data but with 100GB virtual size will be converted efficiently. > > > > However, does this pose a problem if the virtual disk is not zero initialized? > > As Vladimir indicated, the intent of the flag is supposed to be clear > from the name :-) If your storage doesn't read zeroes absent any earlier > writes, you probably don't want to be using it. > > > Theoretically - if all unallocated blocks contain garbage - this > > shouldn't matter, however what about allocated blocks of zero? Will > > convert skip copying allocated zero blocks in the source image to the > > target since it assumes that the target is zeroed out first thing? > > So something like a "--no-need-to-zero" flag would do what you want, > presuming that it would write known zeroes but no longer clean the > device before use? > > dme. > -- > You can't hide from the flipside. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 12:19 ` Sam Eiderman @ 2020-06-10 13:36 ` Vladimir Sementsov-Ogievskiy 2020-06-10 14:06 ` Kevin Wolf 1 sibling, 0 replies; 19+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-06-10 13:36 UTC (permalink / raw) To: Sam Eiderman, David Edmondson Cc: Tony Zhang, qemu-devel, qemu-block, Max Reitz 10.06.2020 15:19, Sam Eiderman wrote: > Thanks David, > > Yes, I imaging the following use case: > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. > /dev/sda is a raw disk containing garbage > > I invoke: > qemu-img convert disk.vmdk -O raw /dev/sda > > Required output: > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img > finishes fast. > > Kevin, from what I understood from you, this is the default behavior. > > So if my VMDK is causing trouble (all virtual size is being written) > this is probably since all the grains in the VMDK are zero allocated > right? > > Thanks! I'm not sure that skipping unallocated clusters in qcow2/vmdk is default. As I see, briefly looking at the code, unallocated clusters are skipped with -B option. But it assuming using some backing file, which is not your case. Let's check: ]# ./qemu-img create -f raw b 1M Formatting 'b', fmt=raw size=1048576 ]# ./qemu-img create -f qcow2 a 1M Formatting 'a', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16 compression_type=zlib ]# ./qemu-io -c 'write -P 0xff 0 1M' -f raw b wrote 1048576/1048576 bytes at offset 0 1 MiB, 1 ops; 00.05 sec (21.646 MiB/sec and 21.6457 ops/sec) ]# xxd b | head 00000000: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000010: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000020: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000030: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000040: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000050: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000060: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000070: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000080: ffff ffff ffff ffff ffff ffff ffff ffff ................ 00000090: ffff ffff ffff ffff ffff ffff ffff ffff ................ ]# ./qemu-img convert -f qcow2 -O raw a b ]# xxd b | head 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ ]# ./qemu-io -c 'write -P 0xff 0 1M' -f raw b wrote 1048576/1048576 bytes at offset 0 1 MiB, 1 ops; 00.05 sec (20.648 MiB/sec and 20.6478 ops/sec) ]# ./qemu-img create -f qcow2 base 1M Formatting 'base', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16 compression_type=zlib ]# ./qemu-img convert -f qcow2 -O raw -B base a b qemu-img: Backing file not supported for file format 'raw' So you see, in a newly created qcow2 file all cllusters are unallocated. Still by default qemu-img convert writes all zeroes. And we can't use -B with raw tartget. > > On Wed, Jun 10, 2020 at 2:56 PM David Edmondson <dme@dme.org> wrote: >> >> On Wednesday, 2020-06-10 at 08:28:29 +03, Sam Eiderman wrote: >> >>> Hi, >>> >>> 168468fe19c8 ("qemu-img: Add --target-is-zero to convert") has added a >>> nice functionality for cloud scenarios: >>> >>> * Create a virtual disk >>> * Convert a sparse image (qcow2, vmdk) to the virtual disk using >>> --target-is-zero >>> * Use the virtual disk >>> >>> This saves many unnecessary writes - a qcow2 with 1MB of allocated >>> data but with 100GB virtual size will be converted efficiently. >>> >>> However, does this pose a problem if the virtual disk is not zero initialized? >> >> As Vladimir indicated, the intent of the flag is supposed to be clear >> from the name :-) If your storage doesn't read zeroes absent any earlier >> writes, you probably don't want to be using it. >> >>> Theoretically - if all unallocated blocks contain garbage - this >>> shouldn't matter, however what about allocated blocks of zero? Will >>> convert skip copying allocated zero blocks in the source image to the >>> target since it assumes that the target is zeroed out first thing? >> >> So something like a "--no-need-to-zero" flag would do what you want, >> presuming that it would write known zeroes but no longer clean the >> device before use? >> >> dme. >> -- >> You can't hide from the flipside. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 12:19 ` Sam Eiderman 2020-06-10 13:36 ` Vladimir Sementsov-Ogievskiy @ 2020-06-10 14:06 ` Kevin Wolf 2020-06-10 15:26 ` Sam Eiderman 1 sibling, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2020-06-10 14:06 UTC (permalink / raw) To: Sam Eiderman Cc: Vladimir Sementsov-Ogievskiy, qemu-block, David Edmondson, qemu-devel, Max Reitz, Tony Zhang Am 10.06.2020 um 14:19 hat Sam Eiderman geschrieben: > Thanks David, > > Yes, I imaging the following use case: > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. > /dev/sda is a raw disk containing garbage > > I invoke: > qemu-img convert disk.vmdk -O raw /dev/sda > > Required output: > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img > finishes fast. > > Kevin, from what I understood from you, this is the default behavior. Sorry, I misunderstood what you want. qemu-img will write zeros to all unallocated parts, too. If it didn't do that, the resulting image on /dev/sda wouldn't be a copy of disk.vmdk. As the metadata (which blocks are allocated) cannot be preserved in raw images, you wouldn't be able to tell which part of the image contains valid data and which part needs to be interpreted as zeros even though it contains random garbage. What is your use case for this result where the actual virtual disk content is mixed with garbage? Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 14:06 ` Kevin Wolf @ 2020-06-10 15:26 ` Sam Eiderman 2020-06-10 15:29 ` Sam Eiderman 2020-06-10 16:31 ` Kevin Wolf 0 siblings, 2 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 15:26 UTC (permalink / raw) To: Kevin Wolf Cc: David Edmondson, qemu-block, qemu-devel, Vladimir Sementsov-Ogievskiy, eblake, Max Reitz, Tony Zhang Thanks for the clarification Kevin, Well first I want to discuss unallocated blocks. From my understanding operating systems do not rely on disks to be zero initialized, on the contrary, physical disks usually contain garbage. So an unallocated block should never be treated as zero by any real world application. Now assuming that I only care about the allocated content of the disks, I would like to save io/time zeroing out unallocated blocks. A real world example would be flushing a 500GB vmdk on a real SSD disk, if the vmdk contained only 2GB of data, no point in writing 498GB of zeroes to that SSD - reducing its lifespan for nothing. Now from what I understand --target-is-zero will give me this behavior even though that I really use it as a "--skip-prezeroing-target" (sorry for the bad name) (This is only true if later *allocated zeroes* are indeed copied correctly) Sam On Wed, Jun 10, 2020 at 5:06 PM Kevin Wolf <kwolf@redhat.com> wrote: > > Am 10.06.2020 um 14:19 hat Sam Eiderman geschrieben: > > Thanks David, > > > > Yes, I imaging the following use case: > > > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. > > /dev/sda is a raw disk containing garbage > > > > I invoke: > > qemu-img convert disk.vmdk -O raw /dev/sda > > > > Required output: > > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img > > finishes fast. > > > > Kevin, from what I understood from you, this is the default behavior. > > Sorry, I misunderstood what you want. qemu-img will write zeros to all > unallocated parts, too. If it didn't do that, the resulting image on > /dev/sda wouldn't be a copy of disk.vmdk. > > As the metadata (which blocks are allocated) cannot be preserved in raw > images, you wouldn't be able to tell which part of the image contains > valid data and which part needs to be interpreted as zeros even though > it contains random garbage. > > What is your use case for this result where the actual virtual disk > content is mixed with garbage? > > Kevin > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:26 ` Sam Eiderman @ 2020-06-10 15:29 ` Sam Eiderman 2020-06-10 15:42 ` David Edmondson 2020-06-10 16:31 ` Kevin Wolf 1 sibling, 1 reply; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 15:29 UTC (permalink / raw) To: Kevin Wolf Cc: David Edmondson, qemu-block, qemu-devel, Vladimir Sementsov-Ogievskiy, eblake, Max Reitz, Tony Zhang Excuse me, Vladimir already pointed out in the first comment that it will skip writing real zeroes later. Sam On Wed, Jun 10, 2020 at 6:26 PM Sam Eiderman <sameid@google.com> wrote: > > Thanks for the clarification Kevin, > > Well first I want to discuss unallocated blocks. > From my understanding operating systems do not rely on disks to be > zero initialized, on the contrary, physical disks usually contain > garbage. > So an unallocated block should never be treated as zero by any real > world application. > > Now assuming that I only care about the allocated content of the > disks, I would like to save io/time zeroing out unallocated blocks. > > A real world example would be flushing a 500GB vmdk on a real SSD > disk, if the vmdk contained only 2GB of data, no point in writing > 498GB of zeroes to that SSD - reducing its lifespan for nothing. > > Now from what I understand --target-is-zero will give me this behavior > even though that I really use it as a "--skip-prezeroing-target" > (sorry for the bad name) > (This is only true if later *allocated zeroes* are indeed copied correctly) > > Sam > > On Wed, Jun 10, 2020 at 5:06 PM Kevin Wolf <kwolf@redhat.com> wrote: > > > > Am 10.06.2020 um 14:19 hat Sam Eiderman geschrieben: > > > Thanks David, > > > > > > Yes, I imaging the following use case: > > > > > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. > > > /dev/sda is a raw disk containing garbage > > > > > > I invoke: > > > qemu-img convert disk.vmdk -O raw /dev/sda > > > > > > Required output: > > > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img > > > finishes fast. > > > > > > Kevin, from what I understood from you, this is the default behavior. > > > > Sorry, I misunderstood what you want. qemu-img will write zeros to all > > unallocated parts, too. If it didn't do that, the resulting image on > > /dev/sda wouldn't be a copy of disk.vmdk. > > > > As the metadata (which blocks are allocated) cannot be preserved in raw > > images, you wouldn't be able to tell which part of the image contains > > valid data and which part needs to be interpreted as zeros even though > > it contains random garbage. > > > > What is your use case for this result where the actual virtual disk > > content is mixed with garbage? > > > > Kevin > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:29 ` Sam Eiderman @ 2020-06-10 15:42 ` David Edmondson 2020-06-10 15:47 ` Sam Eiderman 2020-06-10 15:48 ` Eric Blake 0 siblings, 2 replies; 19+ messages in thread From: David Edmondson @ 2020-06-10 15:42 UTC (permalink / raw) To: Sam Eiderman, Kevin Wolf Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz, Tony Zhang On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: > Excuse me, > > Vladimir already pointed out in the first comment that it will skip > writing real zeroes later. Right. That's why you want something like "--no-need-to-zero-initialise" (the name keeps getting longer!), which would still write zeroes to the blocks that should contain zeroes, as opposed to writing zeroes to prepare the device. > Sam > > On Wed, Jun 10, 2020 at 6:26 PM Sam Eiderman <sameid@google.com> wrote: >> >> Thanks for the clarification Kevin, >> >> Well first I want to discuss unallocated blocks. >> From my understanding operating systems do not rely on disks to be >> zero initialized, on the contrary, physical disks usually contain >> garbage. >> So an unallocated block should never be treated as zero by any real >> world application. >> >> Now assuming that I only care about the allocated content of the >> disks, I would like to save io/time zeroing out unallocated blocks. >> >> A real world example would be flushing a 500GB vmdk on a real SSD >> disk, if the vmdk contained only 2GB of data, no point in writing >> 498GB of zeroes to that SSD - reducing its lifespan for nothing. >> >> Now from what I understand --target-is-zero will give me this behavior >> even though that I really use it as a "--skip-prezeroing-target" >> (sorry for the bad name) >> (This is only true if later *allocated zeroes* are indeed copied correctly) >> >> Sam >> >> On Wed, Jun 10, 2020 at 5:06 PM Kevin Wolf <kwolf@redhat.com> wrote: >> > >> > Am 10.06.2020 um 14:19 hat Sam Eiderman geschrieben: >> > > Thanks David, >> > > >> > > Yes, I imaging the following use case: >> > > >> > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. >> > > /dev/sda is a raw disk containing garbage >> > > >> > > I invoke: >> > > qemu-img convert disk.vmdk -O raw /dev/sda >> > > >> > > Required output: >> > > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img >> > > finishes fast. >> > > >> > > Kevin, from what I understood from you, this is the default behavior. >> > >> > Sorry, I misunderstood what you want. qemu-img will write zeros to all >> > unallocated parts, too. If it didn't do that, the resulting image on >> > /dev/sda wouldn't be a copy of disk.vmdk. >> > >> > As the metadata (which blocks are allocated) cannot be preserved in raw >> > images, you wouldn't be able to tell which part of the image contains >> > valid data and which part needs to be interpreted as zeros even though >> > it contains random garbage. >> > >> > What is your use case for this result where the actual virtual disk >> > content is mixed with garbage? >> > >> > Kevin >> > dme. -- He caught a fleeting glimpse of a man, moving uphill pursued by a bus. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:42 ` David Edmondson @ 2020-06-10 15:47 ` Sam Eiderman 2020-06-10 15:48 ` Eric Blake 1 sibling, 0 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-10 15:47 UTC (permalink / raw) To: David Edmondson Cc: Kevin Wolf, qemu-block, qemu-devel, Vladimir Sementsov-Ogievskiy, eblake, Max Reitz, Tony Zhang Ok great, thanks for making it clear. On Wed, Jun 10, 2020 at 6:42 PM David Edmondson <dme@dme.org> wrote: > > On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: > > > Excuse me, > > > > Vladimir already pointed out in the first comment that it will skip > > writing real zeroes later. > > Right. That's why you want something like "--no-need-to-zero-initialise" > (the name keeps getting longer!), which would still write zeroes to the > blocks that should contain zeroes, as opposed to writing zeroes to > prepare the device. > > > Sam > > > > On Wed, Jun 10, 2020 at 6:26 PM Sam Eiderman <sameid@google.com> wrote: > >> > >> Thanks for the clarification Kevin, > >> > >> Well first I want to discuss unallocated blocks. > >> From my understanding operating systems do not rely on disks to be > >> zero initialized, on the contrary, physical disks usually contain > >> garbage. > >> So an unallocated block should never be treated as zero by any real > >> world application. > >> > >> Now assuming that I only care about the allocated content of the > >> disks, I would like to save io/time zeroing out unallocated blocks. > >> > >> A real world example would be flushing a 500GB vmdk on a real SSD > >> disk, if the vmdk contained only 2GB of data, no point in writing > >> 498GB of zeroes to that SSD - reducing its lifespan for nothing. > >> > >> Now from what I understand --target-is-zero will give me this behavior > >> even though that I really use it as a "--skip-prezeroing-target" > >> (sorry for the bad name) > >> (This is only true if later *allocated zeroes* are indeed copied correctly) > >> > >> Sam > >> > >> On Wed, Jun 10, 2020 at 5:06 PM Kevin Wolf <kwolf@redhat.com> wrote: > >> > > >> > Am 10.06.2020 um 14:19 hat Sam Eiderman geschrieben: > >> > > Thanks David, > >> > > > >> > > Yes, I imaging the following use case: > >> > > > >> > > disk.vmdk is a 50 GB disk that contains 12 MB binary of zeroes in its beginning. > >> > > /dev/sda is a raw disk containing garbage > >> > > > >> > > I invoke: > >> > > qemu-img convert disk.vmdk -O raw /dev/sda > >> > > > >> > > Required output: > >> > > The first 12 MB of /dev/sda contain zeros, the rest garbage, qemu-img > >> > > finishes fast. > >> > > > >> > > Kevin, from what I understood from you, this is the default behavior. > >> > > >> > Sorry, I misunderstood what you want. qemu-img will write zeros to all > >> > unallocated parts, too. If it didn't do that, the resulting image on > >> > /dev/sda wouldn't be a copy of disk.vmdk. > >> > > >> > As the metadata (which blocks are allocated) cannot be preserved in raw > >> > images, you wouldn't be able to tell which part of the image contains > >> > valid data and which part needs to be interpreted as zeros even though > >> > it contains random garbage. > >> > > >> > What is your use case for this result where the actual virtual disk > >> > content is mixed with garbage? > >> > > >> > Kevin > >> > > > dme. > -- > He caught a fleeting glimpse of a man, moving uphill pursued by a bus. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:42 ` David Edmondson 2020-06-10 15:47 ` Sam Eiderman @ 2020-06-10 15:48 ` Eric Blake 2020-06-10 15:57 ` David Edmondson 1 sibling, 1 reply; 19+ messages in thread From: Eric Blake @ 2020-06-10 15:48 UTC (permalink / raw) To: David Edmondson, Sam Eiderman, Kevin Wolf Cc: Tony Zhang, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Max Reitz On 6/10/20 10:42 AM, David Edmondson wrote: > On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: > >> Excuse me, >> >> Vladimir already pointed out in the first comment that it will skip >> writing real zeroes later. > > Right. That's why you want something like "--no-need-to-zero-initialise" > (the name keeps getting longer!), which would still write zeroes to the > blocks that should contain zeroes, as opposed to writing zeroes to > prepare the device. Or maybe something like: qemu-img convert --skip-unallocated which says that a pre-zeroing pass may be attempted, but it if fails, only the explicit zeroes need to be written rather than zeroes for all unallocated areas in the source (so the resulting image will NOT be an identical copy if there were any unallocated areas, but that the user is okay with that). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:48 ` Eric Blake @ 2020-06-10 15:57 ` David Edmondson 2020-06-10 16:21 ` Eric Blake 0 siblings, 1 reply; 19+ messages in thread From: David Edmondson @ 2020-06-10 15:57 UTC (permalink / raw) To: Eric Blake, Sam Eiderman, Kevin Wolf Cc: Tony Zhang, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Max Reitz On Wednesday, 2020-06-10 at 10:48:52 -05, Eric Blake wrote: > On 6/10/20 10:42 AM, David Edmondson wrote: >> On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: >> >>> Excuse me, >>> >>> Vladimir already pointed out in the first comment that it will skip >>> writing real zeroes later. >> >> Right. That's why you want something like "--no-need-to-zero-initialise" >> (the name keeps getting longer!), which would still write zeroes to the >> blocks that should contain zeroes, as opposed to writing zeroes to >> prepare the device. > > Or maybe something like: > > qemu-img convert --skip-unallocated This seems fine. > which says that a pre-zeroing pass may be attempted, but it if fails, This bit puzzles me. In what circumstances might we attempt but fail? Does it really mean "if it can be done instantly, it will be done, but not if it costs something"? I'd be more inclined to go for "unallocated blocks will not be written", without any attempts to pre-zero. > only the explicit zeroes need to be written rather than zeroes for all > unallocated areas in the source (so the resulting image will NOT be an > identical copy if there were any unallocated areas, but that the user > is okay with that). dme. -- Too much information, running through my brain. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:57 ` David Edmondson @ 2020-06-10 16:21 ` Eric Blake 2020-06-11 10:58 ` David Edmondson 0 siblings, 1 reply; 19+ messages in thread From: Eric Blake @ 2020-06-10 16:21 UTC (permalink / raw) To: David Edmondson, Sam Eiderman, Kevin Wolf Cc: Tony Zhang, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Max Reitz On 6/10/20 10:57 AM, David Edmondson wrote: > On Wednesday, 2020-06-10 at 10:48:52 -05, Eric Blake wrote: > >> On 6/10/20 10:42 AM, David Edmondson wrote: >>> On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: >>> >>>> Excuse me, >>>> >>>> Vladimir already pointed out in the first comment that it will skip >>>> writing real zeroes later. >>> >>> Right. That's why you want something like "--no-need-to-zero-initialise" >>> (the name keeps getting longer!), which would still write zeroes to the >>> blocks that should contain zeroes, as opposed to writing zeroes to >>> prepare the device. >> >> Or maybe something like: >> >> qemu-img convert --skip-unallocated > > This seems fine. > >> which says that a pre-zeroing pass may be attempted, but it if fails, > > This bit puzzles me. In what circumstances might we attempt but fail? > Does it really mean "if it can be done instantly, it will be done, but > not if it costs something"? A fast pre-zeroing pass is faster than writing explicit zeroes. If such a fast pass works, then you can avoid further I/O for all subsequent zero sections; the unallocated sections will now happen to read as zero, but that is not a problem since the content of unallocated portions is not guaranteed. But if pre-zeroing is not fast, then you have to spend the extra I/O to explicitly zero the portions that are allocated but read as zero, while still skipping the unallocated portions. > > I'd be more inclined to go for "unallocated blocks will not be written", > without any attempts to pre-zero. But that can be slower, when pre-zeroing is fast. "Unallocated blocks need not be written" allows for optimizations, "unallocated blocks must not be touched" does not. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 16:21 ` Eric Blake @ 2020-06-11 10:58 ` David Edmondson 0 siblings, 0 replies; 19+ messages in thread From: David Edmondson @ 2020-06-11 10:58 UTC (permalink / raw) To: Eric Blake, Sam Eiderman, Kevin Wolf Cc: Tony Zhang, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Max Reitz On Wednesday, 2020-06-10 at 11:21:27 -05, Eric Blake wrote: > On 6/10/20 10:57 AM, David Edmondson wrote: >> On Wednesday, 2020-06-10 at 10:48:52 -05, Eric Blake wrote: >> >>> On 6/10/20 10:42 AM, David Edmondson wrote: >>>> On Wednesday, 2020-06-10 at 18:29:33 +03, Sam Eiderman wrote: >>>> >>>>> Excuse me, >>>>> >>>>> Vladimir already pointed out in the first comment that it will skip >>>>> writing real zeroes later. >>>> >>>> Right. That's why you want something like "--no-need-to-zero-initialise" >>>> (the name keeps getting longer!), which would still write zeroes to the >>>> blocks that should contain zeroes, as opposed to writing zeroes to >>>> prepare the device. >>> >>> Or maybe something like: >>> >>> qemu-img convert --skip-unallocated >> >> This seems fine. >> >>> which says that a pre-zeroing pass may be attempted, but it if fails, >> >> This bit puzzles me. In what circumstances might we attempt but fail? >> Does it really mean "if it can be done instantly, it will be done, but >> not if it costs something"? > > A fast pre-zeroing pass is faster than writing explicit zeroes. If such > a fast pass works, then you can avoid further I/O for all subsequent > zero sections; the unallocated sections will now happen to read as zero, > but that is not a problem since the content of unallocated portions is > not guaranteed. > > But if pre-zeroing is not fast, then you have to spend the extra I/O to > explicitly zero the portions that are allocated but read as zero, while > still skipping the unallocated portions. The lack of deterministic behaviour would worry me. If the caller can't be sure whether the unallocated portions of the device will be zeroed or not, it feels as though the number of potential use cases is reduced. The optimisation is focused on images where there are a significant number of allocated zero blocks. Is that a common case? (It obviously exists, because many images generated before "--target-is-zero" will be like that, but perhaps they would be better cleaned by an unallocator.) >> I'd be more inclined to go for "unallocated blocks will not be written", >> without any attempts to pre-zero. > > But that can be slower, when pre-zeroing is fast. "Unallocated blocks > need not be written" allows for optimizations, "unallocated blocks must > not be touched" does not. "unallocated blocks may not be written" would be fine. dme. -- There is love in you. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 15:26 ` Sam Eiderman 2020-06-10 15:29 ` Sam Eiderman @ 2020-06-10 16:31 ` Kevin Wolf 2020-06-11 13:41 ` Sam Eiderman 1 sibling, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2020-06-10 16:31 UTC (permalink / raw) To: Sam Eiderman Cc: Vladimir Sementsov-Ogievskiy, qemu-block, David Edmondson, qemu-devel, Max Reitz, Tony Zhang Am 10.06.2020 um 17:26 hat Sam Eiderman geschrieben: > Thanks for the clarification Kevin, > > Well first I want to discuss unallocated blocks. > From my understanding operating systems do not rely on disks to be > zero initialized, on the contrary, physical disks usually contain > garbage. > So an unallocated block should never be treated as zero by any real > world application. I think this is a dangerous assumption to make. The guest did have access to these unallocated blocks before, and they read as zero, so not writing these to the conversion target does change the virtual disk. Whether or not this is a harmless change for the guest depends on the software running in the VM. > Now assuming that I only care about the allocated content of the > disks, I would like to save io/time zeroing out unallocated blocks. > > A real world example would be flushing a 500GB vmdk on a real SSD > disk, if the vmdk contained only 2GB of data, no point in writing > 498GB of zeroes to that SSD - reducing its lifespan for nothing. Don't pretty much all SSDs support efficient zeroing/hole punching these days so that the blocks would actually be deallocated on the disk level? > Now from what I understand --target-is-zero will give me this behavior > even though that I really use it as a "--skip-prezeroing-target" > (sorry for the bad name) > (This is only true if later *allocated zeroes* are indeed copied correctly) As you noticed later, it doesn't. The behaviour you want is more like -B, except that you don't have a backing file. If you also pass -n, the actual filename you pass isn't even used, so I guess '-B "" -n' should do the trick? Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Clarification regarding new qemu-img convert --target-is-zero flag 2020-06-10 16:31 ` Kevin Wolf @ 2020-06-11 13:41 ` Sam Eiderman 0 siblings, 0 replies; 19+ messages in thread From: Sam Eiderman @ 2020-06-11 13:41 UTC (permalink / raw) To: Kevin Wolf Cc: David Edmondson, qemu-block, qemu-devel, Vladimir Sementsov-Ogievskiy, eblake, Max Reitz, Tony Zhang [-- Attachment #1: Type: text/plain, Size: 1830 bytes --] On Wed, Jun 10, 2020 at 7:31 PM Kevin Wolf <kwolf@redhat.com> wrote: > Am 10.06.2020 um 17:26 hat Sam Eiderman geschrieben: > > Thanks for the clarification Kevin, > > > > Well first I want to discuss unallocated blocks. > > From my understanding operating systems do not rely on disks to be > > zero initialized, on the contrary, physical disks usually contain > > garbage. > > So an unallocated block should never be treated as zero by any real > > world application. > > I think this is a dangerous assumption to make. The guest did have > access to these unallocated blocks before, and they read as zero, so not > writing these to the conversion target does change the virtual disk. > Whether or not this is a harmless change for the guest depends on the > software running in the VM. > I see your point > > > Now assuming that I only care about the allocated content of the > > disks, I would like to save io/time zeroing out unallocated blocks. > > > > A real world example would be flushing a 500GB vmdk on a real SSD > > disk, if the vmdk contained only 2GB of data, no point in writing > > 498GB of zeroes to that SSD - reducing its lifespan for nothing. > > Don't pretty much all SSDs support efficient zeroing/hole punching these > days so that the blocks would actually be deallocated on the disk level? > > > Now from what I understand --target-is-zero will give me this behavior > > even though that I really use it as a "--skip-prezeroing-target" > > (sorry for the bad name) > > (This is only true if later *allocated zeroes* are indeed copied > correctly) > > As you noticed later, it doesn't. > > The behaviour you want is more like -B, except that you don't have a > backing file. If you also pass -n, the actual filename you pass isn't > even used, so I guess '-B "" -n' should do the trick? > > Kevin > > [-- Attachment #2: Type: text/html, Size: 2521 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-06-11 13:47 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-10 5:28 Clarification regarding new qemu-img convert --target-is-zero flag Sam Eiderman 2020-06-10 6:16 ` Vladimir Sementsov-Ogievskiy 2020-06-10 6:28 ` Sam Eiderman 2020-06-10 11:37 ` Kevin Wolf 2020-06-10 11:52 ` Sam Eiderman 2020-06-10 11:56 ` David Edmondson 2020-06-10 12:19 ` Sam Eiderman 2020-06-10 13:36 ` Vladimir Sementsov-Ogievskiy 2020-06-10 14:06 ` Kevin Wolf 2020-06-10 15:26 ` Sam Eiderman 2020-06-10 15:29 ` Sam Eiderman 2020-06-10 15:42 ` David Edmondson 2020-06-10 15:47 ` Sam Eiderman 2020-06-10 15:48 ` Eric Blake 2020-06-10 15:57 ` David Edmondson 2020-06-10 16:21 ` Eric Blake 2020-06-11 10:58 ` David Edmondson 2020-06-10 16:31 ` Kevin Wolf 2020-06-11 13:41 ` Sam Eiderman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).