[Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
@ 2011-07-21 16:17 Frediano Ziglio
  2011-07-22  8:05 ` Kevin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Frediano Ziglio @ 2011-07-21 16:17 UTC (permalink / raw)
  To: qemu-devel

Hi,
  after a snapshot is taken currently many write operations are quite
slow due to
- refcount updates (decrement old and increment new )
- cluster allocation and file expansion
- read-modify-write on partial clusters

I found 2 way to improve refcount performance

Method 1 - Lazy count
Mainly do not take into account count for current snapshot, that is
current snapshot counts as 0. This would require to add a
current_snapshot in header and update refcount when current is changed.
So for these operation
- creating snapshot, performance are the same, just increment for old
snapshot instead of the new one
- normal write operations. As current snaphot counts as 0 there is not
operations here so do not write any data
- changing current snapshot, this is the worst case, you have to
increment for the current snapshot and decrement for the new so it will
take twice
- deleting snapshot, if is the current just set current_snapshot to a
dummy not existing value, if is not the current just decrement counters,
no performance changes

Method 2 - Read-only parent
Here parents are readonly, instead of storing a refcount store a numeric
id of the owner. If the owner is not current copy the cluster and change
it. Considering this situation

A --- B --- C

B cannot be changed so in order to "change" B you have to create a new
snapshot

A --- B --- C
         \--- D

and change D. It can take more space cause you have in this case an
additional snapshot.

Operations:
- creating snapshot, really fast as you don't have to change any
ownership
- normal write operations. If owner is not the same allocate a new
cluster and just store a new owner for new cluster. Also ownership for
past-to-end cluster could be set all to current owner in order to
collapse allocations
- changing current snapshot, no changes required for owners
- deleting snapshot. Only possible if you have no child or a single
child. Will require to scan all l2 tables and merge and update owner.
Same performance

Regards
  Frediano Ziglio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-21 16:17 [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount Frediano Ziglio
@ 2011-07-22  8:05 ` Kevin Wolf
  2011-07-22  9:13   ` Frediano Ziglio
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2011-07-22  8:05 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: qemu-devel

Am 21.07.2011 18:17, schrieb Frediano Ziglio:
> Hi,
>   after a snapshot is taken currently many write operations are quite
> slow due to
> - refcount updates (decrement old and increment new )
> - cluster allocation and file expansion
> - read-modify-write on partial clusters
> 
> I found 2 way to improve refcount performance
> 
> Method 1 - Lazy count
> Mainly do not take into account count for current snapshot, that is
> current snapshot counts as 0. This would require to add a
> current_snapshot in header and update refcount when current is changed.
> So for these operation
> - creating snapshot, performance are the same, just increment for old
> snapshot instead of the new one
> - normal write operations. As current snaphot counts as 0 there is not
> operations here so do not write any data
> - changing current snapshot, this is the worst case, you have to
> increment for the current snapshot and decrement for the new so it will
> take twice
> - deleting snapshot, if is the current just set current_snapshot to a
> dummy not existing value, if is not the current just decrement counters,
> no performance changes

How would you do cluster allocation if you don't have refcounts any more
that can tell you if a cluster is used or not?

> Method 2 - Read-only parent
> Here parents are readonly, instead of storing a refcount store a numeric
> id of the owner. If the owner is not current copy the cluster and change
> it. Considering this situation
> 
> A --- B --- C
> 
> B cannot be changed so in order to "change" B you have to create a new
> snapshot
> 
> A --- B --- C
>          \--- D
> 
> and change D. It can take more space cause you have in this case an
> additional snapshot.
> 
> Operations:
> - creating snapshot, really fast as you don't have to change any
> ownership
> - normal write operations. If owner is not the same allocate a new
> cluster and just store a new owner for new cluster. Also ownership for
> past-to-end cluster could be set all to current owner in order to
> collapse allocations
> - changing current snapshot, no changes required for owners
> - deleting snapshot. Only possible if you have no child or a single
> child. Will require to scan all l2 tables and merge and update owner.

I think this has similar characteristics as we have with external
snapshots (i.e. backing files). The advantage is that with applying it
to internal snapshots is that when deleting a snapshot you don't have to
copy around all the data.

Probably this change could even be done transparently for the user, so
that B still appears to be writeable, but in fact refers to D now.


Anyway, have you checked how bad the refcount work really is? I think
that writing the VM state takes a lot longer, so that optimising the
refcount update may be the wrong approach, especially if it requires a
format change. My results with qemu-img snapshot suggest that it's not
worth it:

kwolf@dhcp-5-188:~/images$ ~/source/qemu/qemu-img info scratch.qcow2
image: scratch.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 4.0G
cluster_size: 65536
kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -c test
scratch.qcow2

real    0m0.116s
user    0m0.009s
sys     0m0.040s
kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -d test
scratch.qcow2

real    0m0.084s
user    0m0.011s
sys     0m0.044s

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-22  8:05 ` Kevin Wolf
@ 2011-07-22  9:13   ` Frediano Ziglio
  2011-07-22  9:29     ` Stefan Hajnoczi
  2011-07-22  9:30     ` Kevin Wolf
  0 siblings, 2 replies; 7+ messages in thread
From: Frediano Ziglio @ 2011-07-22  9:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

2011/7/22 Kevin Wolf <kwolf@redhat.com>:
> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>> Hi,
>>   after a snapshot is taken currently many write operations are quite
>> slow due to
>> - refcount updates (decrement old and increment new )
>> - cluster allocation and file expansion
>> - read-modify-write on partial clusters
>>
>> I found 2 way to improve refcount performance
>>
>> Method 1 - Lazy count
>> Mainly do not take into account count for current snapshot, that is
>> current snapshot counts as 0. This would require to add a
>> current_snapshot in header and update refcount when current is changed.
>> So for these operation
>> - creating snapshot, performance are the same, just increment for old
>> snapshot instead of the new one
>> - normal write operations. As current snaphot counts as 0 there is not
>> operations here so do not write any data
>> - changing current snapshot, this is the worst case, you have to
>> increment for the current snapshot and decrement for the new so it will
>> take twice
>> - deleting snapshot, if is the current just set current_snapshot to a
>> dummy not existing value, if is not the current just decrement counters,
>> no performance changes
>
> How would you do cluster allocation if you don't have refcounts any more
> that can tell you if a cluster is used or not?
>

You have refcount, is only that current snapshot counts as 0. An
example may help, start with "A" snapshot A counts as zero so all
refcounts are 0, now we create a snapshot "B" and make it current so
refcounts are 1

A --- B

If you change a cluster in snapshot "B" counts are still 1. If you go
back to "A" counters are increment (cause you leave B) and then
decrement (cause you enter in A).

Perhaps the problem is how to distinguish 0 from "allocated in
current" and "not allocated". Yes, with which I suppose above it's a
problem, but we can easily use -1 as not allocated. If current and
refcount 0 mark as -1, if not current we would have to increment
counters of current, mark current as -1 than decrement for deleting,
yes in this case you have twice the time.

>> Method 2 - Read-only parent
>> Here parents are readonly, instead of storing a refcount store a numeric
>> id of the owner. If the owner is not current copy the cluster and change
>> it. Considering this situation
>>
>> A --- B --- C
>>
>> B cannot be changed so in order to "change" B you have to create a new
>> snapshot
>>
>> A --- B --- C
>>          \--- D
>>
>> and change D. It can take more space cause you have in this case an
>> additional snapshot.
>>
>> Operations:
>> - creating snapshot, really fast as you don't have to change any
>> ownership
>> - normal write operations. If owner is not the same allocate a new
>> cluster and just store a new owner for new cluster. Also ownership for
>> past-to-end cluster could be set all to current owner in order to
>> collapse allocations
>> - changing current snapshot, no changes required for owners
>> - deleting snapshot. Only possible if you have no child or a single
>> child. Will require to scan all l2 tables and merge and update owner.
>
> I think this has similar characteristics as we have with external
> snapshots (i.e. backing files). The advantage is that with applying it
> to internal snapshots is that when deleting a snapshot you don't have to
> copy around all the data.
>
> Probably this change could even be done transparently for the user, so
> that B still appears to be writeable, but in fact refers to D now.
>
>
> Anyway, have you checked how bad the refcount work really is? I think
> that writing the VM state takes a lot longer, so that optimising the
> refcount update may be the wrong approach, especially if it requires a
> format change. My results with qemu-img snapshot suggest that it's not
> worth it:
>
> kwolf@dhcp-5-188:~/images$ ~/source/qemu/qemu-img info scratch.qcow2
> image: scratch.qcow2
> file format: qcow2
> virtual size: 8.0G (8589934592 bytes)
> disk size: 4.0G
> cluster_size: 65536
> kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -c test
> scratch.qcow2
>
> real    0m0.116s
> user    0m0.009s
> sys     0m0.040s
> kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -d test
> scratch.qcow2
>
> real    0m0.084s
> user    0m0.011s
> sys     0m0.044s
>
> Kevin
>

I'm not worried about time just taking snapshot more after taking
snapshot during normal use. As you stated taking snapshot you can
disable cache writethrough making it very fast but during normal
operations you can't.

Personally I'm pondering a log too to allow collapsing metadata
updates. Even an external (another file) full log (with data) to try
to reduce even overhead caused by read-modify-write during partial
cluster updates and reduce file fragmentation. But as you can see from
my patches I'm still exercising myself with Qemu code.

Regards
  Frediano

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-22  9:13   ` Frediano Ziglio
@ 2011-07-22  9:29     ` Stefan Hajnoczi
  2011-07-22 14:21       ` Frediano Ziglio
  2011-07-22  9:30     ` Kevin Wolf
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2011-07-22  9:29 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: Kevin Wolf, qemu-devel

On Fri, Jul 22, 2011 at 10:13 AM, Frediano Ziglio <freddy77@gmail.com> wrote:
> 2011/7/22 Kevin Wolf <kwolf@redhat.com>:
>> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>>> Hi,
>>>   after a snapshot is taken currently many write operations are quite
>>> slow due to
>>> - refcount updates (decrement old and increment new )
>>> - cluster allocation and file expansion
>>> - read-modify-write on partial clusters
>>>
>>> I found 2 way to improve refcount performance
>>>
>>> Method 1 - Lazy count
>>> Mainly do not take into account count for current snapshot, that is
>>> current snapshot counts as 0. This would require to add a
>>> current_snapshot in header and update refcount when current is changed.
>>> So for these operation
>>> - creating snapshot, performance are the same, just increment for old
>>> snapshot instead of the new one
>>> - normal write operations. As current snaphot counts as 0 there is not
>>> operations here so do not write any data
>>> - changing current snapshot, this is the worst case, you have to
>>> increment for the current snapshot and decrement for the new so it will
>>> take twice
>>> - deleting snapshot, if is the current just set current_snapshot to a
>>> dummy not existing value, if is not the current just decrement counters,
>>> no performance changes
>>
>> How would you do cluster allocation if you don't have refcounts any more
>> that can tell you if a cluster is used or not?
>>
>
> You have refcount, is only that current snapshot counts as 0. An
> example may help, start with "A" snapshot A counts as zero so all
> refcounts are 0, now we create a snapshot "B" and make it current so
> refcounts are 1
>
> A --- B
>
> If you change a cluster in snapshot "B" counts are still 1. If you go
> back to "A" counters are increment (cause you leave B) and then
> decrement (cause you enter in A).
>
> Perhaps the problem is how to distinguish 0 from "allocated in
> current" and "not allocated". Yes, with which I suppose above it's a
> problem, but we can easily use -1 as not allocated. If current and
> refcount 0 mark as -1, if not current we would have to increment
> counters of current, mark current as -1 than decrement for deleting,
> yes in this case you have twice the time.

I'm not sure I follow your last sentence but just having a different
refcount value for "not allocated" vs "allocated" means allocating
write requests will need to update refcounts.

But are non-append allocations common enough that we should bother
with them in the allocating write path?  Can we append to the end of
the image file for allocating writes and handle defragmentation
elsewhere (i.e. get rid of unallocated clusters in the middle of the
file)?

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-22  9:29     ` Stefan Hajnoczi
@ 2011-07-22 14:21       ` Frediano Ziglio
  2011-07-22 16:47         ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Frediano Ziglio @ 2011-07-22 14:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel

2011/7/22 Stefan Hajnoczi <stefanha@gmail.com>:
> On Fri, Jul 22, 2011 at 10:13 AM, Frediano Ziglio <freddy77@gmail.com> wrote:
>> 2011/7/22 Kevin Wolf <kwolf@redhat.com>:
>>> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>>>> Hi,
>>>>   after a snapshot is taken currently many write operations are quite
>>>> slow due to
>>>> - refcount updates (decrement old and increment new )
>>>> - cluster allocation and file expansion
>>>> - read-modify-write on partial clusters
>>>>
>>>> I found 2 way to improve refcount performance
>>>>
>>>> Method 1 - Lazy count
>>>> Mainly do not take into account count for current snapshot, that is
>>>> current snapshot counts as 0. This would require to add a
>>>> current_snapshot in header and update refcount when current is changed.
>>>> So for these operation
>>>> - creating snapshot, performance are the same, just increment for old
>>>> snapshot instead of the new one
>>>> - normal write operations. As current snaphot counts as 0 there is not
>>>> operations here so do not write any data
>>>> - changing current snapshot, this is the worst case, you have to
>>>> increment for the current snapshot and decrement for the new so it will
>>>> take twice
>>>> - deleting snapshot, if is the current just set current_snapshot to a
>>>> dummy not existing value, if is not the current just decrement counters,
>>>> no performance changes
>>>
>>> How would you do cluster allocation if you don't have refcounts any more
>>> that can tell you if a cluster is used or not?
>>>
>>
>> You have refcount, is only that current snapshot counts as 0. An
>> example may help, start with "A" snapshot A counts as zero so all
>> refcounts are 0, now we create a snapshot "B" and make it current so
>> refcounts are 1
>>
>> A --- B
>>
>> If you change a cluster in snapshot "B" counts are still 1. If you go
>> back to "A" counters are increment (cause you leave B) and then
>> decrement (cause you enter in A).
>>
>> Perhaps the problem is how to distinguish 0 from "allocated in
>> current" and "not allocated". Yes, with which I suppose above it's a
>> problem, but we can easily use -1 as not allocated. If current and
>> refcount 0 mark as -1, if not current we would have to increment
>> counters of current, mark current as -1 than decrement for deleting,
>> yes in this case you have twice the time.
>
> I'm not sure I follow your last sentence but just having a different
> refcount value for "not allocated" vs "allocated" means allocating
> write requests will need to update refcounts.
>

Now you have 0 for not allocated and >0 for allocated. If you assume
current snapshot counting as 0 a 0 in refcount could mean an allocated
cluster in current snapshot not shared with other snapshots and if you
don't use -1 could be also a not allocated cluster.
Thinking in another way is not that you don't update refcounts but you
update refcounts with 0 addend (that's practically not changing
refcounts).
Question was: is possible to use this trick?

> But are non-append allocations common enough that we should bother
> with them in the allocating write path?  Can we append to the end of
> the image file for allocating writes and handle defragmentation
> elsewhere (i.e. get rid of unallocated clusters in the middle of the
> file)?
>
> Stefan
>

I think so but is better to have a way to know if a cluster is
allocated without having to scan all l2 tables.

Frediano

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-22 14:21       ` Frediano Ziglio
@ 2011-07-22 16:47         ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2011-07-22 16:47 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: Kevin Wolf, qemu-devel

On Fri, Jul 22, 2011 at 3:21 PM, Frediano Ziglio <freddy77@gmail.com> wrote:
> 2011/7/22 Stefan Hajnoczi <stefanha@gmail.com>:
>> On Fri, Jul 22, 2011 at 10:13 AM, Frediano Ziglio <freddy77@gmail.com> wrote:
>>> 2011/7/22 Kevin Wolf <kwolf@redhat.com>:
>>>> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>>>>> Hi,
>>>>>   after a snapshot is taken currently many write operations are quite
>>>>> slow due to
>>>>> - refcount updates (decrement old and increment new )
>>>>> - cluster allocation and file expansion
>>>>> - read-modify-write on partial clusters
>>>>>
>>>>> I found 2 way to improve refcount performance
>>>>>
>>>>> Method 1 - Lazy count
>>>>> Mainly do not take into account count for current snapshot, that is
>>>>> current snapshot counts as 0. This would require to add a
>>>>> current_snapshot in header and update refcount when current is changed.
>>>>> So for these operation
>>>>> - creating snapshot, performance are the same, just increment for old
>>>>> snapshot instead of the new one
>>>>> - normal write operations. As current snaphot counts as 0 there is not
>>>>> operations here so do not write any data
>>>>> - changing current snapshot, this is the worst case, you have to
>>>>> increment for the current snapshot and decrement for the new so it will
>>>>> take twice
>>>>> - deleting snapshot, if is the current just set current_snapshot to a
>>>>> dummy not existing value, if is not the current just decrement counters,
>>>>> no performance changes
>>>>
>>>> How would you do cluster allocation if you don't have refcounts any more
>>>> that can tell you if a cluster is used or not?
>>>>
>>>
>>> You have refcount, is only that current snapshot counts as 0. An
>>> example may help, start with "A" snapshot A counts as zero so all
>>> refcounts are 0, now we create a snapshot "B" and make it current so
>>> refcounts are 1
>>>
>>> A --- B
>>>
>>> If you change a cluster in snapshot "B" counts are still 1. If you go
>>> back to "A" counters are increment (cause you leave B) and then
>>> decrement (cause you enter in A).
>>>
>>> Perhaps the problem is how to distinguish 0 from "allocated in
>>> current" and "not allocated". Yes, with which I suppose above it's a
>>> problem, but we can easily use -1 as not allocated. If current and
>>> refcount 0 mark as -1, if not current we would have to increment
>>> counters of current, mark current as -1 than decrement for deleting,
>>> yes in this case you have twice the time.
>>
>> I'm not sure I follow your last sentence but just having a different
>> refcount value for "not allocated" vs "allocated" means allocating
>> write requests will need to update refcounts.
>>
>
> Now you have 0 for not allocated and >0 for allocated. If you assume
> current snapshot counting as 0 a 0 in refcount could mean an allocated
> cluster in current snapshot not shared with other snapshots and if you
> don't use -1 could be also a not allocated cluster.
> Thinking in another way is not that you don't update refcounts but you
> update refcounts with 0 addend (that's practically not changing
> refcounts).
> Question was: is possible to use this trick?
>
>> But are non-append allocations common enough that we should bother
>> with them in the allocating write path?  Can we append to the end of
>> the image file for allocating writes and handle defragmentation
>> elsewhere (i.e. get rid of unallocated clusters in the middle of the
>> file)?
>>
>> Stefan
>>
>
> I think so but is better to have a way to know if a cluster is
> allocated without having to scan all l2 tables.

You don't need to scan *all* L2 tables.  You just need to scan the
current L2 tables because the current "snapshot" doesn't affect
refcounts.

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
  2011-07-22  9:13   ` Frediano Ziglio
  2011-07-22  9:29     ` Stefan Hajnoczi
@ 2011-07-22  9:30     ` Kevin Wolf
  1 sibling, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2011-07-22  9:30 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: qemu-devel

Am 22.07.2011 11:13, schrieb Frediano Ziglio:
> 2011/7/22 Kevin Wolf <kwolf@redhat.com>:
>> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>>> Hi,
>>>   after a snapshot is taken currently many write operations are quite
>>> slow due to
>>> - refcount updates (decrement old and increment new )
>>> - cluster allocation and file expansion
>>> - read-modify-write on partial clusters
>>>
>>> I found 2 way to improve refcount performance
>>>
>>> Method 1 - Lazy count
>>> Mainly do not take into account count for current snapshot, that is
>>> current snapshot counts as 0. This would require to add a
>>> current_snapshot in header and update refcount when current is changed.
>>> So for these operation
>>> - creating snapshot, performance are the same, just increment for old
>>> snapshot instead of the new one
>>> - normal write operations. As current snaphot counts as 0 there is not
>>> operations here so do not write any data
>>> - changing current snapshot, this is the worst case, you have to
>>> increment for the current snapshot and decrement for the new so it will
>>> take twice
>>> - deleting snapshot, if is the current just set current_snapshot to a
>>> dummy not existing value, if is not the current just decrement counters,
>>> no performance changes
>>
>> How would you do cluster allocation if you don't have refcounts any more
>> that can tell you if a cluster is used or not?
>>
> 
> You have refcount, is only that current snapshot counts as 0. An
> example may help, start with "A" snapshot A counts as zero so all
> refcounts are 0, now we create a snapshot "B" and make it current so
> refcounts are 1
> 
> A --- B
> 
> If you change a cluster in snapshot "B" counts are still 1. If you go
> back to "A" counters are increment (cause you leave B) and then
> decrement (cause you enter in A).
> 
> Perhaps the problem is how to distinguish 0 from "allocated in
> current" and "not allocated". Yes, with which I suppose above it's a
> problem, but we can easily use -1 as not allocated. If current and
> refcount 0 mark as -1, if not current we would have to increment
> counters of current, mark current as -1 than decrement for deleting,
> yes in this case you have twice the time.

Yes, this is the problem that I meant. If you use -1 for not allocated,
you're back to our current situation, just with refcount - 1 for each
cluster. In particular, you now need to update refcounts again on writes
(in order to change from -1 to 0).

>>> Method 2 - Read-only parent
>>> Here parents are readonly, instead of storing a refcount store a numeric
>>> id of the owner. If the owner is not current copy the cluster and change
>>> it. Considering this situation
>>>
>>> A --- B --- C
>>>
>>> B cannot be changed so in order to "change" B you have to create a new
>>> snapshot
>>>
>>> A --- B --- C
>>>          \--- D
>>>
>>> and change D. It can take more space cause you have in this case an
>>> additional snapshot.
>>>
>>> Operations:
>>> - creating snapshot, really fast as you don't have to change any
>>> ownership
>>> - normal write operations. If owner is not the same allocate a new
>>> cluster and just store a new owner for new cluster. Also ownership for
>>> past-to-end cluster could be set all to current owner in order to
>>> collapse allocations
>>> - changing current snapshot, no changes required for owners
>>> - deleting snapshot. Only possible if you have no child or a single
>>> child. Will require to scan all l2 tables and merge and update owner.
>>
>> I think this has similar characteristics as we have with external
>> snapshots (i.e. backing files). The advantage is that with applying it
>> to internal snapshots is that when deleting a snapshot you don't have to
>> copy around all the data.
>>
>> Probably this change could even be done transparently for the user, so
>> that B still appears to be writeable, but in fact refers to D now.
>>
>>
>> Anyway, have you checked how bad the refcount work really is? I think
>> that writing the VM state takes a lot longer, so that optimising the
>> refcount update may be the wrong approach, especially if it requires a
>> format change. My results with qemu-img snapshot suggest that it's not
>> worth it:
>>
>> kwolf@dhcp-5-188:~/images$ ~/source/qemu/qemu-img info scratch.qcow2
>> image: scratch.qcow2
>> file format: qcow2
>> virtual size: 8.0G (8589934592 bytes)
>> disk size: 4.0G
>> cluster_size: 65536
>> kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -c test
>> scratch.qcow2
>>
>> real    0m0.116s
>> user    0m0.009s
>> sys     0m0.040s
>> kwolf@dhcp-5-188:~/images$ time ~/source/qemu/qemu-img snapshot -d test
>> scratch.qcow2
>>
>> real    0m0.084s
>> user    0m0.011s
>> sys     0m0.044s
>>
>> Kevin
> 
> I'm not worried about time just taking snapshot more after taking
> snapshot during normal use. As you stated taking snapshot you can
> disable cache writethrough making it very fast but during normal
> operations you can't.

Well, the obvious solution is not using writethrough in this case. You
need it only for some broken guest OSes.

The other solution is adding a dirty flag which says that the refcount
on disk may not be accurate and the refcount must be rebuilt after a
crash. In this case you can drive the metadata cache in a writeback mode
even with cache=writethrough. This dirty flag is included in my proposal
for qcow2v3.

> Personally I'm pondering a log too to allow collapsing metadata
> updates. Even an external (another file) full log (with data) to try
> to reduce even overhead caused by read-modify-write during partial
> cluster updates and reduce file fragmentation. But as you can see from
> my patches I'm still exercising myself with Qemu code.

A journal is something to consider, yes. It's something that requires
some development effort, but long term I think it could provide some
nice advantages. I'm not sure if using it for the full data will help,
but for metadata it would certainly make sense.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-22 16:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-21 16:17 [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount Frediano Ziglio
2011-07-22  8:05 ` Kevin Wolf
2011-07-22  9:13   ` Frediano Ziglio
2011-07-22  9:29     ` Stefan Hajnoczi
2011-07-22 14:21       ` Frediano Ziglio
2011-07-22 16:47         ` Stefan Hajnoczi
2011-07-22  9:30     ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).