From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49050) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YFkO8-0007z7-2D for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YFkO3-00080m-2z for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36667) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YFkO2-0007zo-RT for qemu-devel@nongnu.org; Mon, 26 Jan 2015 09:11:31 -0500 Message-ID: <54C64B0D.3040304@redhat.com> Date: Mon, 26 Jan 2015 09:11:25 -0500 From: Max Reitz MIME-Version: 1.0 References: <201501262119592629551@sangfor.com.cn> In-Reply-To: <201501262119592629551@sangfor.com.cn> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhang Haoyu , qemu-devel Cc: Kevin Wolf , Paolo Bonzini , Fam Zheng , Stefan Hajnoczi On 2015-01-26 at 08:20, Zhang Haoyu wrote: > Hi, all > > Regarding too large qcow2 image, e.g., 2TB, > so long disruption happened when performing snapshot, > which was caused by cache update and IO wait. > perf top data shown as below, > PerfTop: 2554 irqs/sec kernel: 0.4% exact: 0.0% [4000Hz cycles], (target_pid: 34294) > ------------------------------------------------------------------------------------------------------------------------ > > 33.80% qemu-system-x86_64 [.] qcow2_cache_do_get > 27.59% qemu-system-x86_64 [.] qcow2_cache_put > 15.19% qemu-system-x86_64 [.] qcow2_cache_entry_mark_dirty > 5.49% qemu-system-x86_64 [.] update_refcount > 3.02% libpthread-2.13.so [.] pthread_getspecific > 2.26% qemu-system-x86_64 [.] get_refcount > 1.95% qemu-system-x86_64 [.] coroutine_get_thread_state > 1.32% qemu-system-x86_64 [.] qcow2_update_snapshot_refcount > 1.20% qemu-system-x86_64 [.] qemu_coroutine_self > 1.16% libz.so.1.2.7 [.] 0x0000000000003018 > 0.95% qemu-system-x86_64 [.] qcow2_update_cluster_refcount > 0.91% qemu-system-x86_64 [.] qcow2_cache_get > 0.76% libc-2.13.so [.] 0x0000000000134e49 > 0.73% qemu-system-x86_64 [.] bdrv_debug_event > 0.16% qemu-system-x86_64 [.] pthread_getspecific@plt > 0.12% [kernel] [k] _raw_spin_unlock_irqrestore > 0.10% qemu-system-x86_64 [.] vga_draw_line24_32 > 0.09% [vdso] [.] 0x000000000000060c > 0.09% qemu-system-x86_64 [.] qcow2_check_metadata_overlap > 0.08% [kernel] [k] do_blockdev_direct_IO > > If expand the cache table size, the IO will be decreased, > but the calculation time will be grown. > so it's worthy to optimize qcow2 cache get and put algorithm. > > My proposal: > get: > using ((use offset >> cluster_bits) % c->size) to locate the cache entry, > raw implementation, > index = (use offset >> cluster_bits) % c->size; > if (c->entries[index].offset == offset) { > goto found; > } > > replace: > c->entries[use offset >> cluster_bits) % c->size].offset = offset; Well, direct-mapped caches do have their benefits, but remember that they do have disadvantages, too. Regarding CPU caches, set associative caches seem to be largely favored, so that may be a better idea. CC'ing Kevin, because it's his code. Max > ... > > put: > using 64-entries cache table to cache > the recently got c->entries, i.e., cache for cache, > then during put process, firstly search the 64-entries cache, > if not found, then the c->entries. > > Any idea? > > Thanks, > Zhang Haoyu > >