qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Hu Tao <hutao@cn.fujitsu.com>, Peter Lieven <pl@kamp.de>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH v2 5/6] qcow2: implement bdrv_preallocate
Date: Mon, 16 Dec 2013 17:21:23 +0800	[thread overview]
Message-ID: <52AEC613.50800@redhat.com> (raw)
In-Reply-To: <20131211073339.GE17024@G08FNSTD100614.fnst.cn.fujitsu.com>

On 2013年12月11日 15:33, Hu Tao wrote:
> On Thu, Nov 28, 2013 at 11:03:04AM +0100, Peter Lieven wrote:
>> On 28.11.2013 09:48, Hu Tao wrote:
>>> On Wed, Nov 27, 2013 at 11:13:40AM +0100, Peter Lieven wrote:
>>>> Am 27.11.2013 11:07, schrieb Fam Zheng:
>>>>> On 2013年11月27日 18:03, Peter Lieven wrote:
>>>>>> Am 27.11.2013 07:40, schrieb Fam Zheng:
>>>>>>> On 2013年11月27日 14:01, Hu Tao wrote:
>>>>>>>> On Wed, Nov 27, 2013 at 11:01:23AM +0800, Fam Zheng wrote:
>>>>>>>>> On 2013年11月27日 10:15, Hu Tao wrote:
>>>>>>>>>> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
>>>>>>>>>> ---
>>>>>>>>>>     block/qcow2.c | 7 +++++++
>>>>>>>>>>     1 file changed, 7 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>>>>>>>>> index b054a01..a23fade 100644
>>>>>>>>>> --- a/block/qcow2.c
>>>>>>>>>> +++ b/block/qcow2.c
>>>>>>>>>> @@ -2180,6 +2180,12 @@ static int qcow2_amend_options(BlockDriverState *bs,
>>>>>>>>>>         return 0;
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +static int qcow2_preallocate(BlockDriverState *bs, int64_t offset,
>>>>>>>>>> +                             int64_t length)
>>>>>>>>>> +{
>>>>>>>>>> +    return bdrv_preallocate(bs->file, offset, length);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>> What's the semantics of .bdrv_preallocate? I think you should map
>>>>>>>>> [offset, offset + length) to clusters in image file, and then
>>>>>>>>> forward to bs->file, rather than this direct wrapper.
>>>>>>>>>
>>>>>>>>> E.g. bdrv_preallocate(qcow2_bs, 0, cluster_size) should call
>>>>>>>>> bdrv_preallocate(qcow2_bs->file, offset_off_first_cluster,
>>>>>>>>> cluster_size).
>>>>>>>> You mean data clusters here, right? Is there a single function to get
>>>>>>>> the offset of the first data cluster?
>>>>>>>>
>>>>>>> There is a function, qcow2_get_cluster_offset.
>>>>>> This should return no valid offset as long as the cluster is not allocated.
>>>>>>
>>>>>> I think you actually have to "write" all clusters of a qcow2 one by one.
>>>>>> Eventually this write could be an fallocate call instead of a zero write.
>>>>>>
>>>>> Yes, I was wrong about qcow2_get_cluster_offset. The logic here is more like cluster allocation in qcow2_alloc_cluster_offset. Maybe we can reuse that.
>>>> What I don't like about the preallocation is that we would loose the information that a cluster contains no valid data and would read it e.g. during
>>>> conversion.
>>> So the information is stored in table and you mean we shouldn't clear
>>> table when do preallocation? I'm not sure how the information could be
>>> useful on a newly-created image, but it seems ideal to keep informations
>>> in table.
>> When you want to e.g. convert this qcow2 later the performance is lower than needed because
>> you read all those preallocated sectors altough you could now they are empty.
>>>
>>>> I think what we want is a preallocated image with all clusters sequentally mapped into the qcow2 file. Preallocate all the cluster extends, but still
>>>> have the information in the table that the cluster in fact has no valid data. So we would need a valid cluster offset while still haveing the
>>>> flag that the cluster is unallocated. I think this would require thoughtfully checking all the cluster functions if they can easily cope with this.
>>>>
>>>> The quetion is Hu, what do you want to achieve? Do you want that the space on the filesystem is preallocated so you can't overcommit or
>>>> do you also want a sequential mapping of all the clusters into the file?
>>> The goal is to avoid sparse file as it can cause performance problem. So
>>> the first one. I'm not sure about the second but IIUC, one fallocate()
>>> is enough for all clusters if they are sequentially mapped.
>> If you do not premap them they are allocated in the order they are written.
>> So if you are going to preallocate the whole file anyway, you should sequentally map all clusters into the file
>> AND still keep the information that they are in fact not yet written.
>
> Can this be achieved by first fallocate() the disk file, then allocate
> metadata? This way all metadata clusters are allocated before any data
> clusters, leaving all data clusters at the end of file.
>

I think Peter means your need to sequentially map clusters into the 
file, so that sequential IO in guest is translated to sequential IO on 
the image file.

fallocate() or posix_fallocate() should work. You need to set zero flag 
on the allocated cluster when mapping it in L2, instead of actually 
writing zeros.

Fam

  parent reply	other threads:[~2013-12-16  9:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27  2:15 [Qemu-devel] [RFC PATCH v2 0/6] qemu-img: add preallocation=full Hu Tao
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 1/6] block: introduce prealloc_mode Hu Tao
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 2/6] block: add BlockDriver.bdrv_preallocate Hu Tao
2013-11-27  2:35   ` Fam Zheng
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 3/6] block/raw-posix: implement bdrv_preallocate Hu Tao
2013-11-27  2:40   ` Fam Zheng
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 4/6] raw-posix: Add full image preallocation option Hu Tao
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 5/6] qcow2: implement bdrv_preallocate Hu Tao
2013-11-27  3:01   ` Fam Zheng
2013-11-27  6:01     ` Hu Tao
2013-11-27  6:40       ` Fam Zheng
2013-11-27 10:03         ` Peter Lieven
2013-11-27 10:07           ` Fam Zheng
2013-11-27 10:13             ` Peter Lieven
2013-11-28  8:48               ` Hu Tao
2013-11-28 10:03                 ` Peter Lieven
2013-12-11  7:33                   ` Hu Tao
2013-12-16  8:24                     ` Hu Tao
2013-12-16  9:21                     ` Fam Zheng [this message]
2013-12-17  2:03                       ` Hu Tao
2013-11-27  2:15 ` [Qemu-devel] [RFC PATCH v2 6/6] qcow2: Add full image preallocation option Hu Tao
2013-11-27  3:22 ` [Qemu-devel] [RFC PATCH v2 0/6] qemu-img: add preallocation=full Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52AEC613.50800@redhat.com \
    --to=famz@redhat.com \
    --cc=hutao@cn.fujitsu.com \
    --cc=kwolf@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).