[Qemu-devel] do not lseek in qcow2 block

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] do not lseek in qcow2 block_status
@ 2018-12-24 15:50 Vladimir Sementsov-Ogievskiy
  2018-12-24 20:42 ` [Qemu-devel] [Qemu-block] " Nir Soffer
  2019-01-07 12:52 ` [Qemu-devel] " Max Reitz
  0 siblings, 2 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-24 15:50 UTC (permalink / raw)
  To: qemu-devel, qemu block
  Cc: Max Reitz, Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi,
	Eric Blake, Denis Lunev

Hi all!

bdrv_co_block_status digs bs->file for additional, more accurate search for hole
inside region, reported as DATA by bs since long-ago

   commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
   Author: Paolo Bonzini <pbonzini@redhat.com>
   Date:   Wed Sep 4 19:00:38 2013 +0200

       block: look for zero blocks in bs->file

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows,
where are holes and where is data. But every block_status request calls lseek
additionally. Recently (and not the first time) we have faced the situation,
when lseek works slow. Of course, it is more about kernel implementation of it,
but why to involve lseek at all, if we have the information on qcow2 level?
Assume a big disk, full of data, in any iterative copying block job (or img
convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks
will have to iterate through all metadata up to the end of file. It's obviously
ineffective behavior.

I'm thinking about, how to properly workaround this thing, and I have some
questions.

What are the cases, when we really need digging for zeros in bs->file?

Why in mirror we care about discard vs write_zeros, keeping in mind, that
block_status will return ZERO instead for unallocated regions in some cases, not
related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing file is
smaller than request.start)?

And finally, what to do with the problem?

Obviously, the simplest thing is just revert 5daa74a. What will it break?

Otherwise, I need to "revert" it in some cases, and, may be, it should be a
user-defined parameter.. Should it?  And what are the cases? We need to "free"
of 5daa74a at least qcow2.. But again, should it be format-specific option, or
something more generic?

Then, if it would be separate parameter, how should it interfere with
want_zeros, or even, should it be want_zeros, reworked a bit to be want_lseek?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] do not lseek in qcow2 block_status
  2018-12-24 15:50 [Qemu-devel] do not lseek in qcow2 block_status Vladimir Sementsov-Ogievskiy
@ 2018-12-24 20:42 ` Nir Soffer
  2018-12-25  9:45   ` Vladimir Sementsov-Ogievskiy
  2019-01-07 12:52 ` [Qemu-devel] " Max Reitz
  1 sibling, 1 reply; 6+ messages in thread
From: Nir Soffer @ 2018-12-24 20:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu block, Kevin Wolf, Fam Zheng, Denis Lunev,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini

On Mon, Dec 24, 2018 at 5:50 PM Vladimir Sementsov-Ogievskiy <
vsementsov@virtuozzo.com> wrote:

> Hi all!
>
> bdrv_co_block_status digs bs->file for additional, more accurate search
> for hole
> inside region, reported as DATA by bs since long-ago
>
>    commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>    Author: Paolo Bonzini <pbonzini@redhat.com>
>    Date:   Wed Sep 4 19:00:38 2013 +0200
>
>        block: look for zero blocks in bs->file
>
> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
> knows,
> where are holes and where is data. But every block_status request calls
> lseek
> additionally. Recently (and not the first time) we have faced the
> situation,
> when lseek works slow. Of course, it is more about kernel implementation
> of it,
> but why to involve lseek at all, if we have the information on qcow2 level?
> Assume a big disk, full of data, in any iterative copying block job (or img
> convert) we'll call lseek(HOLE) on every iteration, and each of these
> lseeks
> will have to iterate through all metadata up to the end of file. It's
> obviously
> ineffective behavior.

I'm thinking about, how to properly workaround this thing, and I have some
> questions.
>
> What are the cases, when we really need digging for zeros in bs->file?
>
> Why in mirror we care about discard vs write_zeros, keeping in mind, that
> block_status will return ZERO instead for unallocated regions in some
> cases, not
> related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing
> file is
> smaller than request.start)?
>
> And finally, what to do with the problem?
>
> Obviously, the simplest thing is just revert 5daa74a. What will it break?
>
> Otherwise, I need to "revert" it in some cases, and, may be, it should be a
> user-defined parameter.. Should it?  And what are the cases? We need to
> "free"
> of 5daa74a at least qcow2.. But again, should it be format-specific
> option, or
> something more generic?
>
> Then, if it would be separate parameter, how should it interfere with
> want_zeros, or even, should it be want_zeros, reworked a bit to be
> want_lseek?
>

How about having 2 passes:
1. seek DATA/HOLE for entire image, keep result in memory
2. use the result when modifying ranges

This can also help allocation of clusters in a destination image:
1. seek DATA/HOLE in source image
2. allocate blocks in destination
3. convert using out of order writes *without* causing fragmentation in the
destination image.

Here is an example using this approach when copying images to qem-nbd. This
is a pretty
simplistic single threaded implementation in python, using "qemu-img map"
to find the data
and zero ranges. It is faster than qemu-img convert :-)

https://github.com/oVirt/ovirt-imageio/blob/master/examples/nbd-client
https://github.com/oVirt/ovirt-imageio/commit/bec7cb677e33c6d0a7645c367af3d56b70b666db


Nir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] do not lseek in qcow2 block_status
  2018-12-24 20:42 ` [Qemu-devel] [Qemu-block] " Nir Soffer
@ 2018-12-25  9:45   ` Vladimir Sementsov-Ogievskiy
  2018-12-28 13:56     ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-25  9:45 UTC (permalink / raw)
  To: Nir Soffer
  Cc: qemu-devel, qemu block, Kevin Wolf, Fam Zheng, Denis Lunev,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini

24.12.2018 23:42, Nir Soffer wrote:
> On Mon, Dec 24, 2018 at 5:50 PM Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com <mailto:vsementsov@virtuozzo.com>> wrote:
> 
>     Hi all!
> 
>     bdrv_co_block_status digs bs->file for additional, more accurate search for hole
>     inside region, reported as DATA by bs since long-ago
> 
>         commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>         Author: Paolo Bonzini <pbonzini@redhat.com <mailto:pbonzini@redhat.com>>
>         Date:   Wed Sep 4 19:00:38 2013 +0200
> 
>             block: look for zero blocks in bs->file
> 
>     This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows,
>     where are holes and where is data. But every block_status request calls lseek
>     additionally. Recently (and not the first time) we have faced the situation,
>     when lseek works slow. Of course, it is more about kernel implementation of it,
>     but why to involve lseek at all, if we have the information on qcow2 level?
>     Assume a big disk, full of data, in any iterative copying block job (or img
>     convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks
>     will have to iterate through all metadata up to the end of file. It's obviously
>     ineffective behavior.
> 
>     I'm thinking about, how to properly workaround this thing, and I have some
>     questions.
> 
>     What are the cases, when we really need digging for zeros in bs->file?
> 
>     Why in mirror we care about discard vs write_zeros, keeping in mind, that
>     block_status will return ZERO instead for unallocated regions in some cases, not
>     related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing file is
>     smaller than request.start)?
> 
>     And finally, what to do with the problem?
> 
>     Obviously, the simplest thing is just revert 5daa74a. What will it break?
> 
>     Otherwise, I need to "revert" it in some cases, and, may be, it should be a
>     user-defined parameter.. Should it?  And what are the cases? We need to "free"
>     of 5daa74a at least qcow2.. But again, should it be format-specific option, or
>     something more generic?
> 
>     Then, if it would be separate parameter, how should it interfere with
>     want_zeros, or even, should it be want_zeros, reworked a bit to be want_lseek?
> 
> 
> How about having 2 passes:
> 1. seek DATA/HOLE for entire image, keep result in memory
> 2. use the result when modifying ranges
Yes I think this is possible, but in case of qcow2, again, we don't need lseek at all,
and most optimal way is not call it.

> 
> This can also help allocation of clusters in a destination image:
> 1. seek DATA/HOLE in source image
> 2. allocate blocks in destination
> 3. convert using out of order writes *without* causing fragmentation in the destination image.

Interesting idea, I've thought about something like this too. It should work for img convert, as
disk is static, but not for mirror..

Ok, for not static disk, we can implement cache filter driver for caching block_status results
(and to cache lseek results, we need to allow block_status return more information than requested,
like we do in NBD protocol), but again, for qcow2 we already have additional metadata layer which
is qcow2 itself, so lseek is redundant.

> 
> Here is an example using this approach when copying images to qem-nbd. This is a pretty
> simplistic single threaded implementation in python, using "qemu-img map" to find the data
> and zero ranges. It is faster than qemu-img convert :-)
> 
> https://github.com/oVirt/ovirt-imageio/blob/master/examples/nbd-client
> https://github.com/oVirt/ovirt-imageio/commit/bec7cb677e33c6d0a7645c367af3d56b70b666db
> 
> Nir


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] do not lseek in qcow2 block_status
  2018-12-25  9:45   ` Vladimir Sementsov-Ogievskiy
@ 2018-12-28 13:56     ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2018-12-28 13:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Nir Soffer
  Cc: qemu-devel, qemu block, Kevin Wolf, Fam Zheng, Denis Lunev,
	Max Reitz, Stefan Hajnoczi

On 25/12/18 10:45, Vladimir Sementsov-Ogievskiy wrote:
>>
>>     What are the cases, when we really need digging for zeros in bs->file?

It's needed for preallocation=metadata.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] do not lseek in qcow2 block_status
  2018-12-24 15:50 [Qemu-devel] do not lseek in qcow2 block_status Vladimir Sementsov-Ogievskiy
  2018-12-24 20:42 ` [Qemu-devel] [Qemu-block] " Nir Soffer
@ 2019-01-07 12:52 ` Max Reitz
  2019-01-08 11:28   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 6+ messages in thread
From: Max Reitz @ 2019-01-07 12:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu block
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Eric Blake,
	Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

On 24.12.18 16:50, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> bdrv_co_block_status digs bs->file for additional, more accurate search for hole
> inside region, reported as DATA by bs since long-ago
> 
>    commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>    Author: Paolo Bonzini <pbonzini@redhat.com>
>    Date:   Wed Sep 4 19:00:38 2013 +0200
> 
>        block: look for zero blocks in bs->file
> 
> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows,
> where are holes and where is data. But every block_status request calls lseek
> additionally. Recently (and not the first time) we have faced the situation,
> when lseek works slow. Of course, it is more about kernel implementation of it,
> but why to involve lseek at all, if we have the information on qcow2 level?
> Assume a big disk, full of data, in any iterative copying block job (or img
> convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks
> will have to iterate through all metadata up to the end of file. It's obviously
> ineffective behavior.
> 
> I'm thinking about, how to properly workaround this thing, and I have some
> questions.
> 
> What are the cases, when we really need digging for zeros in bs->file?
> 
> Why in mirror we care about discard vs write_zeros, keeping in mind, that
> block_status will return ZERO instead for unallocated regions in some cases, not
> related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing file is
> smaller than request.start)?
> 
> And finally, what to do with the problem?
> 
> Obviously, the simplest thing is just revert 5daa74a. What will it break?
> 
> Otherwise, I need to "revert" it in some cases, and, may be, it should be a
> user-defined parameter.. Should it?  And what are the cases? We need to "free"
> of 5daa74a at least qcow2.. But again, should it be format-specific option, or
> something more generic?
> 
> Then, if it would be separate parameter, how should it interfere with
> want_zeros, or even, should it be want_zeros, reworked a bit to be want_lseek?

As far as I have understood, the @want_zero parameter basically is
exactly this.  If you want the call to be quick, you set it to false.
If you want the result to be accurate, you set it to true.

Well, for raw images it'll still go down to bs->file even with
want_zero=false, but I don't think there's much else we can do, really.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] do not lseek in qcow2 block_status
  2019-01-07 12:52 ` [Qemu-devel] " Max Reitz
@ 2019-01-08 11:28   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-08 11:28 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu block
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Eric Blake,
	Denis Lunev

07.01.2019 15:52, Max Reitz wrote:
> On 24.12.18 16:50, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> bdrv_co_block_status digs bs->file for additional, more accurate search for hole
>> inside region, reported as DATA by bs since long-ago
>>
>>     commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>>     Author: Paolo Bonzini <pbonzini@redhat.com>
>>     Date:   Wed Sep 4 19:00:38 2013 +0200
>>
>>         block: look for zero blocks in bs->file
>>
>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows,
>> where are holes and where is data. But every block_status request calls lseek
>> additionally. Recently (and not the first time) we have faced the situation,
>> when lseek works slow. Of course, it is more about kernel implementation of it,
>> but why to involve lseek at all, if we have the information on qcow2 level?
>> Assume a big disk, full of data, in any iterative copying block job (or img
>> convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks
>> will have to iterate through all metadata up to the end of file. It's obviously
>> ineffective behavior.
>>
>> I'm thinking about, how to properly workaround this thing, and I have some
>> questions.
>>
>> What are the cases, when we really need digging for zeros in bs->file?
>>
>> Why in mirror we care about discard vs write_zeros, keeping in mind, that
>> block_status will return ZERO instead for unallocated regions in some cases, not
>> related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing file is
>> smaller than request.start)?
>>
>> And finally, what to do with the problem?
>>
>> Obviously, the simplest thing is just revert 5daa74a. What will it break?
>>
>> Otherwise, I need to "revert" it in some cases, and, may be, it should be a
>> user-defined parameter.. Should it?  And what are the cases? We need to "free"
>> of 5daa74a at least qcow2.. But again, should it be format-specific option, or
>> something more generic?
>>
>> Then, if it would be separate parameter, how should it interfere with
>> want_zeros, or even, should it be want_zeros, reworked a bit to be want_lseek?
> 
> As far as I have understood, the @want_zero parameter basically is
> exactly this.  If you want the call to be quick, you set it to false.
> If you want the result to be accurate, you set it to true.
> 
> Well, for raw images it'll still go down to bs->file even with
> want_zero=false, but I don't think there's much else we can do, really.
> 
> Max
> 

want_zero also affects, that unallocated areas with bdrv_unallocated_blocks_are_zero() is
true are still reported unallocated (without ZERO) if want_zero is not set. (and similar for
unallocated areas after end of backing file)..

Also, want_zero is currently internal parameter, just set to true in _block_status and false in
_is_allocated.

Ok, finally, is it OK to add no-lseek option for 'file' block driver, which will behave exactly
as want_zero = false?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-08 11:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-24 15:50 [Qemu-devel] do not lseek in qcow2 block_status Vladimir Sementsov-Ogievskiy
2018-12-24 20:42 ` [Qemu-devel] [Qemu-block] " Nir Soffer
2018-12-25  9:45   ` Vladimir Sementsov-Ogievskiy
2018-12-28 13:56     ` Paolo Bonzini
2019-01-07 12:52 ` [Qemu-devel] " Max Reitz
2019-01-08 11:28   ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).