From: Fam Zheng <famz@redhat.com>
To: Hu Tao <hutao@cn.fujitsu.com>, Peter Lieven <pl@kamp.de>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH v2 5/6] qcow2: implement bdrv_preallocate
Date: Mon, 16 Dec 2013 17:21:23 +0800 [thread overview]
Message-ID: <52AEC613.50800@redhat.com> (raw)
In-Reply-To: <20131211073339.GE17024@G08FNSTD100614.fnst.cn.fujitsu.com>
On 2013年12月11日 15:33, Hu Tao wrote:
> On Thu, Nov 28, 2013 at 11:03:04AM +0100, Peter Lieven wrote:
>> On 28.11.2013 09:48, Hu Tao wrote:
>>> On Wed, Nov 27, 2013 at 11:13:40AM +0100, Peter Lieven wrote:
>>>> Am 27.11.2013 11:07, schrieb Fam Zheng:
>>>>> On 2013年11月27日 18:03, Peter Lieven wrote:
>>>>>> Am 27.11.2013 07:40, schrieb Fam Zheng:
>>>>>>> On 2013年11月27日 14:01, Hu Tao wrote:
>>>>>>>> On Wed, Nov 27, 2013 at 11:01:23AM +0800, Fam Zheng wrote:
>>>>>>>>> On 2013年11月27日 10:15, Hu Tao wrote:
>>>>>>>>>> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
>>>>>>>>>> ---
>>>>>>>>>> block/qcow2.c | 7 +++++++
>>>>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>>>>>>>>> index b054a01..a23fade 100644
>>>>>>>>>> --- a/block/qcow2.c
>>>>>>>>>> +++ b/block/qcow2.c
>>>>>>>>>> @@ -2180,6 +2180,12 @@ static int qcow2_amend_options(BlockDriverState *bs,
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> +static int qcow2_preallocate(BlockDriverState *bs, int64_t offset,
>>>>>>>>>> + int64_t length)
>>>>>>>>>> +{
>>>>>>>>>> + return bdrv_preallocate(bs->file, offset, length);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>> What's the semantics of .bdrv_preallocate? I think you should map
>>>>>>>>> [offset, offset + length) to clusters in image file, and then
>>>>>>>>> forward to bs->file, rather than this direct wrapper.
>>>>>>>>>
>>>>>>>>> E.g. bdrv_preallocate(qcow2_bs, 0, cluster_size) should call
>>>>>>>>> bdrv_preallocate(qcow2_bs->file, offset_off_first_cluster,
>>>>>>>>> cluster_size).
>>>>>>>> You mean data clusters here, right? Is there a single function to get
>>>>>>>> the offset of the first data cluster?
>>>>>>>>
>>>>>>> There is a function, qcow2_get_cluster_offset.
>>>>>> This should return no valid offset as long as the cluster is not allocated.
>>>>>>
>>>>>> I think you actually have to "write" all clusters of a qcow2 one by one.
>>>>>> Eventually this write could be an fallocate call instead of a zero write.
>>>>>>
>>>>> Yes, I was wrong about qcow2_get_cluster_offset. The logic here is more like cluster allocation in qcow2_alloc_cluster_offset. Maybe we can reuse that.
>>>> What I don't like about the preallocation is that we would loose the information that a cluster contains no valid data and would read it e.g. during
>>>> conversion.
>>> So the information is stored in table and you mean we shouldn't clear
>>> table when do preallocation? I'm not sure how the information could be
>>> useful on a newly-created image, but it seems ideal to keep informations
>>> in table.
>> When you want to e.g. convert this qcow2 later the performance is lower than needed because
>> you read all those preallocated sectors altough you could now they are empty.
>>>
>>>> I think what we want is a preallocated image with all clusters sequentally mapped into the qcow2 file. Preallocate all the cluster extends, but still
>>>> have the information in the table that the cluster in fact has no valid data. So we would need a valid cluster offset while still haveing the
>>>> flag that the cluster is unallocated. I think this would require thoughtfully checking all the cluster functions if they can easily cope with this.
>>>>
>>>> The quetion is Hu, what do you want to achieve? Do you want that the space on the filesystem is preallocated so you can't overcommit or
>>>> do you also want a sequential mapping of all the clusters into the file?
>>> The goal is to avoid sparse file as it can cause performance problem. So
>>> the first one. I'm not sure about the second but IIUC, one fallocate()
>>> is enough for all clusters if they are sequentially mapped.
>> If you do not premap them they are allocated in the order they are written.
>> So if you are going to preallocate the whole file anyway, you should sequentally map all clusters into the file
>> AND still keep the information that they are in fact not yet written.
>
> Can this be achieved by first fallocate() the disk file, then allocate
> metadata? This way all metadata clusters are allocated before any data
> clusters, leaving all data clusters at the end of file.
>
I think Peter means your need to sequentially map clusters into the
file, so that sequential IO in guest is translated to sequential IO on
the image file.
fallocate() or posix_fallocate() should work. You need to set zero flag
on the allocated cluster when mapping it in L2, instead of actually
writing zeros.
Fam
next prev parent reply other threads:[~2013-12-16 9:22 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-27 2:15 [Qemu-devel] [RFC PATCH v2 0/6] qemu-img: add preallocation=full Hu Tao
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 1/6] block: introduce prealloc_mode Hu Tao
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 2/6] block: add BlockDriver.bdrv_preallocate Hu Tao
2013-11-27 2:35 ` Fam Zheng
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 3/6] block/raw-posix: implement bdrv_preallocate Hu Tao
2013-11-27 2:40 ` Fam Zheng
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 4/6] raw-posix: Add full image preallocation option Hu Tao
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 5/6] qcow2: implement bdrv_preallocate Hu Tao
2013-11-27 3:01 ` Fam Zheng
2013-11-27 6:01 ` Hu Tao
2013-11-27 6:40 ` Fam Zheng
2013-11-27 10:03 ` Peter Lieven
2013-11-27 10:07 ` Fam Zheng
2013-11-27 10:13 ` Peter Lieven
2013-11-28 8:48 ` Hu Tao
2013-11-28 10:03 ` Peter Lieven
2013-12-11 7:33 ` Hu Tao
2013-12-16 8:24 ` Hu Tao
2013-12-16 9:21 ` Fam Zheng [this message]
2013-12-17 2:03 ` Hu Tao
2013-11-27 2:15 ` [Qemu-devel] [RFC PATCH v2 6/6] qcow2: Add full image preallocation option Hu Tao
2013-11-27 3:22 ` [Qemu-devel] [RFC PATCH v2 0/6] qemu-img: add preallocation=full Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52AEC613.50800@redhat.com \
--to=famz@redhat.com \
--cc=hutao@cn.fujitsu.com \
--cc=kwolf@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).