From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44904) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fiYq9-0007WJ-Ix for qemu-devel@nongnu.org; Thu, 26 Jul 2018 01:29:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fiYq6-0007FK-JF for qemu-devel@nongnu.org; Thu, 26 Jul 2018 01:29:29 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55214 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fiYq6-0007Dw-Cc for qemu-devel@nongnu.org; Thu, 26 Jul 2018 01:29:26 -0400 Date: Thu, 26 Jul 2018 13:29:15 +0800 From: Peter Xu Message-ID: <20180726052915.GK2479@xz-mi> References: <20180719121520.30026-1-xiaoguangrong@tencent.com> <20180719121520.30026-4-xiaoguangrong@tencent.com> <20180723043634.GC2491@xz-mi> <8ae4beeb-0c6d-04a1-189a-972bcf342656@gmail.com> <20180723080559.GI2491@xz-mi> <20180725164401.GD2365@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180725164401.GD2365@work-vm> Subject: Re: [Qemu-devel] [PATCH v2 3/8] migration: show the statistics of compression List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Xiao Guangrong , pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, wei.w.wang@intel.com, jiang.biao2@zte.com.cn, eblake@redhat.com, Xiao Guangrong On Wed, Jul 25, 2018 at 05:44:02PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > On Mon, Jul 23, 2018 at 03:39:18PM +0800, Xiao Guangrong wrote: > > > > > > > > > On 07/23/2018 12:36 PM, Peter Xu wrote: > > > > On Thu, Jul 19, 2018 at 08:15:15PM +0800, guangrong.xiao@gmail.com wrote: > > > > > @@ -1597,6 +1608,24 @@ static void migration_update_rates(RAMState *rs, int64_t end_time) > > > > > rs->xbzrle_cache_miss_prev) / iter_count; > > > > > rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss; > > > > > } > > > > > + > > > > > + if (migrate_use_compression()) { > > > > > + uint64_t comp_pages; > > > > > + > > > > > + compression_counters.busy_rate = (double)(compression_counters.busy - > > > > > + rs->compress_thread_busy_prev) / iter_count; > > > > > > > > Here I'm not sure it's correct... > > > > > > > > "iter_count" stands for ramstate.iterations. It's increased per > > > > ram_find_and_save_block(), so IMHO it might contain multiple guest > > > > > > ram_find_and_save_block() returns if a page is successfully posted and > > > it only posts 1 page out at one time. > > > > ram_find_and_save_block() calls ram_save_host_page(), and we should be > > sending multiple guest pages in ram_save_host_page() if the host page > > is a huge page? > > > > > > > > > pages. However compression_counters.busy should be per guest page. > > > > > > > > > > Actually, it's derived from xbzrle_counters.cache_miss_rate: > > > xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss - > > > rs->xbzrle_cache_miss_prev) / iter_count; > > > > Then this is suspecious to me too... > > Actually; I think this isn't totally wrong; iter_count is the *difference* in > iterations since the last time it was updated: > > uint64_t iter_count = rs->iterations - rs->iterations_prev; > > xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss - > rs->xbzrle_cache_miss_prev) / iter_count; > > so this is: > cache-misses-since-last-update > ------------------------------ > iterations since last-update > > so the 'miss_rate' is ~misses / iteration. > Although that doesn't really correspond to time. I'm not sure I got the idea here, the thing is that I think the counters are for different granularities which might be problematic: - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's per-guest-page granularity - RAMState.iterations is done for each ram_find_and_save_block(), so it's per-host-page granularity An example is that when we migrate a 2M huge page in the guest, we will only increase the RAMState.iterations by 1 (since ram_find_and_save_block() will be called once), but we might increase xbzrle_counters.cache_miss for 2M/4K=512 times (we'll call save_xbzrle_page() that many times) if all the pages got cache miss. Then IMHO the cache miss rate will be 512/1=51200% (while it should actually be just 100% cache miss). Regards, -- Peter Xu