From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache
Date: Mon, 24 Jan 2011 16:36:50 +0100 [thread overview]
Message-ID: <4D3D9C92.2040609@redhat.com> (raw)
In-Reply-To: <AANLkTi=cSZwrdb814MceqkkULoAd0yyh=TRQK_nobBjw@mail.gmail.com>
Am 24.01.2011 16:26, schrieb Stefan Hajnoczi:
> On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> [ Re-adding qemu-devel to CC ]
>>
>> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi:
>>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>>>
>>>> if (m->nb_available & (s->cluster_sectors - 1)) {
>>>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
>>>> + cow = true;
>>>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
>>>> m->nb_available - end, s->cluster_sectors);
>>>> if (ret < 0)
>>>> goto err;
>>>> }
>>>>
>>>> - /* update L2 table */
>>>> + /*
>>>> + * Update L2 table.
>>>> + *
>>>> + * Before we update the L2 table to actually point to the new cluster, we
>>>> + * need to be sure that the refcounts have been increased and COW was
>>>> + * handled.
>>>> + */
>>>> + if (cow) {
>>>> + bdrv_flush(bs->file);
>>>
>>> Just bdrv_flush(bs->file) and not a refcounts cache flush?
>>
>> Refcounts and data need not to be ordered against each other. They only
>> must both be on disk when we write the L2 table.
>
> Have I missed where refcounts actually get flushed from the cache out
> to the disk because bdrv_flush(bs->file) only syncs the file but
> doesn't write out dirty data held in cache.
The bdrv_flush isn't supposed to flush the refcounts, but the data
copied during COW (what you call pre/postfill in QED)
The refcounts are handled by the qcow2_cache_set_dependency below, so
that before writing the L2 tables we always write the refcounts first.
>>>> + }
>>>> +
>>>> + qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
>>>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index);
>>>> if (ret < 0) {
>>>> goto err;
>>>> }
>>>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>>>
>>>> for (i = 0; i < m->nb_clusters; i++) {
>>>> /* if two concurrent writes happen to the same unallocated cluster
>>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
>>>> }
>>>>
>>>> - /*
>>>> - * Before we update the L2 table to actually point to the new cluster, we
>>>> - * need to be sure that the refcounts have been increased and COW was
>>>> - * handled.
>>>> - */
>>>> - bdrv_flush(bs->file);
>>>>
>>>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, m->nb_clusters);
>>>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>> if (ret < 0) {
>>>> - qcow2_l2_cache_reset(bs);
>>>> goto err;
>>>> }
>>>>
>>>
>>> The function continues like this:
>>>
>>> /*
>>> * If this was a COW, we need to decrease the refcount of the old cluster.
>>> * Also flush bs->file to get the right order for L2 and refcount update.
>>> */
>>> if (j != 0) {
>>> bdrv_flush(bs->file);
>>> for (i = 0; i < j; i++) {
>>> qcow2_free_any_clusters(bs,
>>> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1);
>>> }
>>> }
>>>
>>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount
>>> update"? We've just put an L2 table, should this be an L2 table
>>> flush?
>>
>> I agree, this looks wrong. What we need is a dependency to ensure that
>> first L2 is flushed and then the refcount block.
>> qcow2_free_any_clusters() calls update_refcount() indirectly, which
>> takes care of setting this dependency.
>>
>> So in the end I think it's just an unnecessary bdrv_flush. Makes sense?
>
> I don't understand this fully. I've noticed that the cache isn't the
> only mechanism for making changes to tables - there are functions like
> write_l2_entries() that directly write out parts of tables without
> honoring dependencies or using the dirty bit on the cache entry. I
> probably need to look at this more carefully to fully understand
> whether or not it is correct.
No, the cache should be the only mechanism that is used for accessing L2
tables and refcount blocks. write_l2_entries() is an old function that
is removed by the patch.
Direct accesses should only be left for L1 tables and refcount tables.
Kevin
next prev parent reply other threads:[~2011-01-24 15:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-20 17:10 [Qemu-devel] [PATCH v4 0/3] qcow2 metadata cache Kevin Wolf
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 1/3] qcow2: Add QcowCache Kevin Wolf
2011-01-24 14:00 ` Stefan Hajnoczi
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache Kevin Wolf
[not found] ` <AANLkTimeRQVVeNnRbaiRNQ-SbExyJDZiuteBP7wfv9He@mail.gmail.com>
2011-01-24 14:54 ` Kevin Wolf
2011-01-24 15:26 ` Stefan Hajnoczi
2011-01-24 15:36 ` Kevin Wolf [this message]
2011-01-24 15:39 ` Stefan Hajnoczi
2011-02-09 11:19 ` Avi Kivity
2011-02-09 11:23 ` Avi Kivity
2011-01-20 17:10 ` [Qemu-devel] [PATCH v4 3/3] qcow2: Batch flushes for COW Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3D9C92.2040609@redhat.com \
--to=kwolf@redhat.com \
--cc=Qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).