From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60020) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vwvvm-0000dW-Tu for qemu-devel@nongnu.org; Sat, 28 Dec 2013 10:36:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vwvvh-0001Oj-76 for qemu-devel@nongnu.org; Sat, 28 Dec 2013 10:36:02 -0500 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:42421 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vwvvg-0001OT-SX for qemu-devel@nongnu.org; Sat, 28 Dec 2013 10:35:57 -0500 Message-ID: <52BEEFD7.4060405@kamp.de> Date: Sat, 28 Dec 2013 16:35:51 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1388074792-29946-1-git-send-email-pl@kamp.de> <52BCF2B1.8050108@redhat.com> In-Reply-To: <52BCF2B1.8050108@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC PATCH] qcow2: add a readahead cache for qcow2_decompress_cluster List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng , qemu-devel@nongnu.org Cc: kwolf@redhat.com, pbonzini@redhat.com, ronniesahlberg@gmail.com, stefanha@redhat.com Am 27.12.2013 04:23, schrieb Fam Zheng: > On 2013年12月27日 00:19, Peter Lieven wrote: >> while evaluatiing compressed qcow2 images as a good basis for >> virtual machine templates I found out that there are a lot >> of partly redundant (compressed clusters have common physical >> sectors) and relatively short reads. >> >> This doesn't hurt if the image resides on a local >> filesystem where we can benefit from the local page cache, >> but it adds a lot of penalty when accessing remote images >> on NFS or similar exports. >> >> This patch effectevily implements a readahead of 2 * cluster_size >> which is 2 * 64kB per default resulting in 128kB readahead. This >> is the common setting for Linux for instance. >> >> For example this leads to the following times when converting >> a compressed qcow2 image to a local tmpfs partition. >> >> Old: >> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >> real 0m24.681s >> user 0m8.597s >> sys 0m4.084s >> >> New: >> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >> real 0m16.121s >> user 0m7.932s >> sys 0m2.244s >> >> Signed-off-by: Peter Lieven >> --- >> block/qcow2-cluster.c | 27 +++++++++++++++++++++++++-- >> block/qcow2.h | 1 + >> 2 files changed, 26 insertions(+), 2 deletions(-) > > I like this idea, but here's a question. Actually, this penalty is common to all protocol drivers: curl, gluster, whatever. Readahead is not only good for compression processing, but also quite helpful for boot: BIOS and GRUB may send sequential 1 sector IO, synchronously, thus suffer from high latency of network communication. So I think if we want to do this, we will want to share it with other format and protocol combinations. I had the same idea in mind. Not only high latency, but also high I/O load on the storage as reading sectors one by one produces high IOPS. But we have to be very careful: - Its likely that the OS already does a readahead so we should not put the complexity in qemu in this case. - We definetely destroy zero copy functionality. My idea would be that we only do a readahead if we observe a read smaller than n bytes and then maybe round up to this size. Maybe we should only place this logic only in place if there is a 1 sector read and then read e.g. 4K. In any case this has to be an opt-in feature. If I have some time I will collect some historgram of transfer size versus timing booting popular OSs. Peter