qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [RFC V6 27/33] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
       [not found]           ` <20130228161418.GA4214@irqsave.net>
@ 2013-03-01  8:59             ` Kevin Wolf
  2013-03-01  9:34               ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2013-03-01  8:59 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel, Stefan Hajnoczi

Am 28.02.2013 um 17:14 hat Benoît Canet geschrieben:
> Le Thursday 28 Feb 2013 à 11:14:34 (+0100), Kevin Wolf a écrit :
> > Am 28.02.2013 um 10:41 hat Stefan Hajnoczi geschrieben:
> > > On Wed, Feb 27, 2013 at 04:00:28PM +0100, Benoît Canet wrote:
> > > > > > -                if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
> > > > > > +                if (!s->has_dedup &&
> > > > > > +                    (refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
> > > > > > +                    fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
> > > > > > +                        PRIx64 " refcount=%d\n", l2_entry, refcount);
> > > > > > +                    res->corruptions++;
> > > > > > +                }
> > > > > 
> > > > > Why is this warning suppressed when dedup is enabled?  The meaning of
> > > > > QCOW_OFLAG_COPIED is that refcount == 1.  If this invariant is violated
> > > > > then something is wrong.
> > > > 
> > > > When deduplication is done refcount will be bigger than one and
> > > > QCOW_OFLAG_COPIED will be cleared.
> > > > 
> > > > Then if enough logical clustere pointing to the same physical cluster are
> > > > rewritten with something else the refcount will goes down back to one.
> > > > 
> > > > But this time QCOW_OFLAG_COPIED can be set back so this equality won't be true.
> > > 
> > > When the refcount decreases to 1 again we need to set QCOW_OFLAG_COPIED
> > > again.  qcow2-snapshot.c:qcow2_snapshot_delete() does this with:
> > > 
> > >     /* must update the copied flag on the current cluster offsets */
> > >     ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 0);
> > > 
> > > Is dedup not restoring QCOW_OFLAG_COPIED?
> > 
> > This is a very expensive operation. I don't think that you can do it for
> > each deduplicated cluster that is overwritten. Not doing it comes with
> > the cost of doing more COW than is actually needed. And we need to
> > mention in the spec that QCOW_OFLAG_COPIED can be missing on clusters
> > with deduplication enabled.
> 
> Also when two logical clusters point to the same physical cluster and one of the
> logical cluster get overwritten the deduplication code has no way to know the
> index of the last logical cluster entry.

Well, strictly speaking you can. The qcow2_update_snapshot_refcount()
call that Stefan mention does exactly that. It's just insanely expensive
because it has to look at the refcounts for all clusters.

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [RFC V6 27/33] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  2013-03-01  8:59             ` [Qemu-devel] [RFC V6 27/33] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Kevin Wolf
@ 2013-03-01  9:34               ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-03-01  9:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Benoît Canet, qemu-devel

On Fri, Mar 01, 2013 at 09:59:33AM +0100, Kevin Wolf wrote:
> Am 28.02.2013 um 17:14 hat Benoît Canet geschrieben:
> > Le Thursday 28 Feb 2013 à 11:14:34 (+0100), Kevin Wolf a écrit :
> > > Am 28.02.2013 um 10:41 hat Stefan Hajnoczi geschrieben:
> > > > On Wed, Feb 27, 2013 at 04:00:28PM +0100, Benoît Canet wrote:
> > > > > > > -                if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
> > > > > > > +                if (!s->has_dedup &&
> > > > > > > +                    (refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
> > > > > > > +                    fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
> > > > > > > +                        PRIx64 " refcount=%d\n", l2_entry, refcount);
> > > > > > > +                    res->corruptions++;
> > > > > > > +                }
> > > > > > 
> > > > > > Why is this warning suppressed when dedup is enabled?  The meaning of
> > > > > > QCOW_OFLAG_COPIED is that refcount == 1.  If this invariant is violated
> > > > > > then something is wrong.
> > > > > 
> > > > > When deduplication is done refcount will be bigger than one and
> > > > > QCOW_OFLAG_COPIED will be cleared.
> > > > > 
> > > > > Then if enough logical clustere pointing to the same physical cluster are
> > > > > rewritten with something else the refcount will goes down back to one.
> > > > > 
> > > > > But this time QCOW_OFLAG_COPIED can be set back so this equality won't be true.
> > > > 
> > > > When the refcount decreases to 1 again we need to set QCOW_OFLAG_COPIED
> > > > again.  qcow2-snapshot.c:qcow2_snapshot_delete() does this with:
> > > > 
> > > >     /* must update the copied flag on the current cluster offsets */
> > > >     ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 0);
> > > > 
> > > > Is dedup not restoring QCOW_OFLAG_COPIED?
> > > 
> > > This is a very expensive operation. I don't think that you can do it for
> > > each deduplicated cluster that is overwritten. Not doing it comes with
> > > the cost of doing more COW than is actually needed. And we need to
> > > mention in the spec that QCOW_OFLAG_COPIED can be missing on clusters
> > > with deduplication enabled.
> > 
> > Also when two logical clusters point to the same physical cluster and one of the
> > logical cluster get overwritten the deduplication code has no way to know the
> > index of the last logical cluster entry.
> 
> Well, strictly speaking you can. The qcow2_update_snapshot_refcount()
> call that Stefan mention does exactly that. It's just insanely expensive
> because it has to look at the refcounts for all clusters.

Okay, I agree that qcow2_update_snapshot_refcount() is too expensive.

Please add a comment explaining that QCOW_OFLAG_COPIED is not guaranteed
when dedup is enabled since it would be too expensive to do this
everything sharing breaks (refcount is decremented to 1).

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [RFC V6 13/33] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe
       [not found]   ` <20130207094814.GH1081@stefanha-thinkpad.redhat.com>
@ 2013-03-11 12:59     ` Benoît Canet
  0 siblings, 0 replies; 5+ messages in thread
From: Benoît Canet @ 2013-03-11 12:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

Le Thursday 07 Feb 2013 à 10:48:14 (+0100), Stefan Hajnoczi a écrit :
> On Wed, Feb 06, 2013 at 01:31:46PM +0100, Benoît Canet wrote:
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index ef91216..5b1d20d 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -710,6 +710,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
> >  
> >      for (i = 0; i < m->nb_clusters; i++) {
> >          uint64_t flags = 0;
> > +        uint64_t offset = cluster_offset + (i << s->cluster_bits);
> >          /* if two concurrent writes happen to the same unallocated cluster
> >  	 * each write allocates separate cluster and writes data concurrently.
> >  	 * The first one to complete updates l2 table with pointer to its
> > @@ -722,8 +723,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
> >          flags = m->oflag_copied ? QCOW_OFLAG_COPIED : 0;
> >          flags |= m->to_deduplicate ? QCOW_OFLAG_TO_DEDUP : 0;
> >  
> > -        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> > -                    (i << s->cluster_bits)) | flags);
> > +        l2_table[l2_index + i] = cpu_to_be64(offset | flags);
> > +
> > +        /* make the deduplication forget the cluster to avoid making
> > +         * the dedup pointing to a cluster that has changed on it's back.
> > +         */
> > +        if (m->to_deduplicate) {
> > +            qcow2_dedup_forget_cluster_by_sector(bs, offset >> 9);
> > +        }
> 
> This does not play well with internal snapshots.  Imagine that an
> internal snapshot was taken, so refcount == 2.
> 
> Now the cluster is overwritten by the guest but we still need to hang on
> to the original data since the snapshot refers to it.
> 
> If dedup forgets about the cluster then dedup is only effective for the
> current disk image, but it ignores snapshots.  Ideally dedup would take
> snapshots into account since they share the same data clusters as the
> current image.
> 
> Stefan

Is checking the refcount with a qcow2_get_refcounts(index) a solution ? 

Benoît

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [RFC V6 19/33] block: Add qcow2_dedup format and image creation code.
       [not found]   ` <20130207101621.GM1081@stefanha-thinkpad.redhat.com>
@ 2013-03-11 15:20     ` Benoît Canet
  2013-03-12  9:33       ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Benoît Canet @ 2013-03-11 15:20 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

> > +    if (dedup) {
> > +        version = 3;
> > +    }
> 
> Lazy refcounts don't force the version.  It would be consistent to
> refrain from forcing the version too.

I don't understand. Aren't the incompatible features bits implying version = 3 ?
When compat=1.1 is set the code force version to 3.

Benoît

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [RFC V6 19/33] block: Add qcow2_dedup format and image creation code.
  2013-03-11 15:20     ` [Qemu-devel] [RFC V6 19/33] block: Add qcow2_dedup format and image creation code Benoît Canet
@ 2013-03-12  9:33       ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-03-12  9:33 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

On Mon, Mar 11, 2013 at 04:20:03PM +0100, Benoît Canet wrote:
> > > +    if (dedup) {
> > > +        version = 3;
> > > +    }
> > 
> > Lazy refcounts don't force the version.  It would be consistent to
> > refrain from forcing the version too.
> 
> I don't understand. Aren't the incompatible features bits implying version = 3 ?
> When compat=1.1 is set the code force version to 3.

Only compat= affects the version.  Lazy refcounts and dedup should be
orthogonal.

If you specify lazy_refcounts=on without compat=1.1 you get an error.

Same should apply to deduplication.  Otherwise the qemu-img create
behavior is inconsistent - you get an error for lazy_refcounts but no
error and silent version=3 for deduplication.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-03-12  9:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1360153926-9492-1-git-send-email-benoit@irqsave.net>
     [not found] ` <1360153926-9492-28-git-send-email-benoit@irqsave.net>
     [not found]   ` <20130208105746.GD7835@stefanha-thinkpad.redhat.com>
     [not found]     ` <20130227150028.GC4010@irqsave.net>
     [not found]       ` <20130228094139.GB12748@stefanha-thinkpad.redhat.com>
     [not found]         ` <20130228101434.GD2429@dhcp-200-207.str.redhat.com>
     [not found]           ` <20130228161418.GA4214@irqsave.net>
2013-03-01  8:59             ` [Qemu-devel] [RFC V6 27/33] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Kevin Wolf
2013-03-01  9:34               ` Stefan Hajnoczi
     [not found] ` <1360153926-9492-14-git-send-email-benoit@irqsave.net>
     [not found]   ` <20130207094814.GH1081@stefanha-thinkpad.redhat.com>
2013-03-11 12:59     ` [Qemu-devel] [RFC V6 13/33] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
     [not found] ` <1360153926-9492-20-git-send-email-benoit@irqsave.net>
     [not found]   ` <20130207101621.GM1081@stefanha-thinkpad.redhat.com>
2013-03-11 15:20     ` [Qemu-devel] [RFC V6 19/33] block: Add qcow2_dedup format and image creation code Benoît Canet
2013-03-12  9:33       ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).