From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache
Date: Mon, 24 Jan 2011 16:36:50 +0100 [thread overview]
Message-ID: <4D3D9C92.2040609@redhat.com> (raw)
In-Reply-To: <AANLkTi=cSZwrdb814MceqkkULoAd0yyh=TRQK_nobBjw@mail.gmail.com>
Am 24.01.2011 16:26, schrieb Stefan Hajnoczi:
> On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> [ Re-adding qemu-devel to CC ]
>>
>> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi:
>>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>>>
>>>> if (m->nb_available & (s->cluster_sectors - 1)) {
>>>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
>>>> + cow = true;
>>>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
>>>> m->nb_available - end, s->cluster_sectors);
>>>> if (ret < 0)
>>>> goto err;
>>>> }
>>>>
>>>> - /* update L2 table */
>>>> + /*
>>>> + * Update L2 table.
>>>> + *
>>>> + * Before we update the L2 table to actually point to the new cluster, we
>>>> + * need to be sure that the refcounts have been increased and COW was
>>>> + * handled.
>>>> + */
>>>> + if (cow) {
>>>> + bdrv_flush(bs->file);
>>>
>>> Just bdrv_flush(bs->file) and not a refcounts cache flush?
>>
>> Refcounts and data need not to be ordered against each other. They only
>> must both be on disk when we write the L2 table.
>
> Have I missed where refcounts actually get flushed from the cache out
> to the disk because bdrv_flush(bs->file) only syncs the file but
> doesn't write out dirty data held in cache.
The bdrv_flush isn't supposed to flush the refcounts, but the data
copied during COW (what you call pre/postfill in QED)
The refcounts are handled by the qcow2_cache_set_dependency below, so
that before writing the L2 tables we always write the refcounts first.
>>>> + }
>>>> +
>>>> + qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
>>>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index);
>>>> if (ret < 0) {
>>>> goto err;
>>>> }
>>>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>>>
>>>> for (i = 0; i < m->nb_clusters; i++) {
>>>> /* if two concurrent writes happen to the same unallocated cluster
>>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
>>>> }
>>>>
>>>> - /*
>>>> - * Before we update the L2 table to actually point to the new cluster, we
>>>> - * need to be sure that the refcounts have been increased and COW was
>>>> - * handled.
>>>> - */
>>>> - bdrv_flush(bs->file);
>>>>
>>>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, m->nb_clusters);
>>>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>> if (ret < 0) {
>>>> - qcow2_l2_cache_reset(bs);
>>>> goto err;
>>>> }
>>>>
>>>
>>> The function continues like this:
>>>
>>> /*
>>> * If this was a COW, we need to decrease the refcount of the old cluster.
>>> * Also flush bs->file to get the right order for L2 and refcount update.
>>> */
>>> if (j != 0) {
>>> bdrv_flush(bs->file);
>>> for (i = 0; i < j; i++) {
>>> qcow2_free_any_clusters(bs,
>>> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1);
>>> }
>>> }
>>>
>>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount
>>> update"? We've just put an L2 table, should this be an L2 table
>>> flush?
>>
>> I agree, this looks wrong. What we need is a dependency to ensure that
>> first L2 is flushed and then the refcount block.
>> qcow2_free_any_clusters() calls update_refcount() indirectly, which
>> takes care of setting this dependency.
>>
>> So in the end I think it's just an unnecessary bdrv_flush. Makes sense?
>
> I don't understand this fully. I've noticed that the cache isn't the
> only mechanism for making changes to tables - there are functions like
> write_l2_entries() that directly write out parts of tables without
> honoring dependencies or using the dirty bit on the cache entry. I
> probably need to look at this more carefully to fully understand
> whether or not it is correct.
No, the cache should be the only mechanism that is used for accessing L2
tables and refcount blocks. write_l2_entries() is an old function that
is removed by the patch.
Direct accesses should only be left for L1 tables and refcount tables.
Kevin
next prev parent reply other threads:[~2011-01-24 15:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-20 17:10 [Qemu-devel] [PATCH v4 0/3] qcow2 metadata cache Kevin Wolf
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 1/3] qcow2: Add QcowCache Kevin Wolf
2011-01-24 14:00 ` Stefan Hajnoczi
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache Kevin Wolf
[not found] ` <AANLkTimeRQVVeNnRbaiRNQ-SbExyJDZiuteBP7wfv9He@mail.gmail.com>
2011-01-24 14:54 ` Kevin Wolf
2011-01-24 15:26 ` Stefan Hajnoczi
2011-01-24 15:36 ` Kevin Wolf [this message]
2011-01-24 15:39 ` Stefan Hajnoczi
2011-02-09 11:19 ` Avi Kivity
2011-02-09 11:23 ` Avi Kivity
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 3/3] qcow2: Batch flushes for COW Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3D9C92.2040609@redhat.com \
--to=kwolf@redhat.com \
--cc=Qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.