From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49050)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YFkO8-0007z7-2D
	for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:37 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YFkO3-00080m-2z
	for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:36 -0500
Received: from mx1.redhat.com ([209.132.183.28]:36667)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YFkO2-0007zo-RT
	for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:31 -0500
Message-ID: <54C64B0D.3040304@redhat.com>
Date: Mon, 26 Jan 2015 09:11:25 -0500
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <201501262119592629551@sangfor.com.cn>
In-Reply-To: <201501262119592629551@sangfor.com.cn>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Zhang Haoyu <zhanghy@sangfor.com.cn>, qemu-devel <qemu-devel@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>

On 2015-01-26 at 08:20, Zhang Haoyu wrote:
> Hi, all
>
> Regarding too large qcow2 image, e.g., 2TB,
> so long disruption happened when performing snapshot,
> which was caused by cache update and IO wait.
> perf top data shown as below,
>     PerfTop:    2554 irqs/sec  kernel: 0.4%  exact:  0.0% [4000Hz cycles],  (target_pid: 34294)
> ------------------------------------------------------------------------------------------------------------------------
>
>      33.80%  qemu-system-x86_64  [.] qcow2_cache_do_get
>      27.59%  qemu-system-x86_64  [.] qcow2_cache_put
>      15.19%  qemu-system-x86_64  [.] qcow2_cache_entry_mark_dirty
>       5.49%  qemu-system-x86_64  [.] update_refcount
>       3.02%  libpthread-2.13.so  [.] pthread_getspecific
>       2.26%  qemu-system-x86_64  [.] get_refcount
>       1.95%  qemu-system-x86_64  [.] coroutine_get_thread_state
>       1.32%  qemu-system-x86_64  [.] qcow2_update_snapshot_refcount
>       1.20%  qemu-system-x86_64  [.] qemu_coroutine_self
>       1.16%  libz.so.1.2.7       [.] 0x0000000000003018
>       0.95%  qemu-system-x86_64  [.] qcow2_update_cluster_refcount
>       0.91%  qemu-system-x86_64  [.] qcow2_cache_get
>       0.76%  libc-2.13.so        [.] 0x0000000000134e49
>       0.73%  qemu-system-x86_64  [.] bdrv_debug_event
>       0.16%  qemu-system-x86_64  [.] pthread_getspecific@plt
>       0.12%  [kernel]            [k] _raw_spin_unlock_irqrestore
>       0.10%  qemu-system-x86_64  [.] vga_draw_line24_32
>       0.09%  [vdso]              [.] 0x000000000000060c
>       0.09%  qemu-system-x86_64  [.] qcow2_check_metadata_overlap
>       0.08%  [kernel]            [k] do_blockdev_direct_IO
>
> If expand the cache table size, the IO will be decreased,
> but the calculation time will be grown.
> so it's worthy to optimize qcow2 cache get and put algorithm.
>
> My proposal:
> get:
> using ((use offset >> cluster_bits) % c->size) to locate the cache entry,
> raw implementation,
> index = (use offset >> cluster_bits) % c->size;
> if (c->entries[index].offset == offset) {
>      goto found;
> }
>
> replace:
> c->entries[use offset >> cluster_bits) % c->size].offset = offset;

Well, direct-mapped caches do have their benefits, but remember that 
they do have disadvantages, too. Regarding CPU caches, set associative 
caches seem to be largely favored, so that may be a better idea.

CC'ing Kevin, because it's his code.

Max

> ...
>
> put:
> using 64-entries cache table to cache
> the recently got c->entries, i.e., cache for cache,
> then during put process, firstly search the 64-entries cache,
> if not found, then the c->entries.
>
> Any idea?
>
> Thanks,
> Zhang Haoyu
>
>