From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:60761) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQJXF-0006GR-Aw for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:35:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UQJX8-0007WQ-UY for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:35:37 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:43504) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQJX8-0007Vu-P5 for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:35:30 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 11 Apr 2013 09:35:29 -0600 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 4FD6019D8048 for ; Thu, 11 Apr 2013 09:35:19 -0600 (MDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3BFZEBA129826 for ; Thu, 11 Apr 2013 09:35:15 -0600 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3BFbp65029104 for ; Thu, 11 Apr 2013 09:37:52 -0600 Message-ID: <5166D826.207@linux.vnet.ibm.com> Date: Thu, 11 Apr 2013 11:35:02 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <1365632901-15470-1-git-send-email-mrhines@linux.vnet.ibm.com> <1365632901-15470-11-git-send-email-mrhines@linux.vnet.ibm.com> <20130411073843.GB19601@redhat.com> <51667FEE.903@redhat.com> <5166B9A9.9070904@linux.vnet.ibm.com> <5166C59A.4010904@redhat.com> <5166CF56.2060105@linux.vnet.ibm.com> <5166D1DA.3050804@redhat.com> In-Reply-To: <5166D1DA.3050804@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v1: 10/13] introduce new command migrate_check_for_zero List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aliguori@us.ibm.com, "Michael S. Tsirkin" , qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com On 04/11/2013 11:08 AM, Paolo Bonzini wrote: > Il 11/04/2013 16:57, Michael R. Hines ha scritto: >> We have hardware already with front side bus speeds of 13 GB/s. >> >> We also already have 5 GB/s RDMA hardware, and we will likely >> have even faster RDMA hardware in the future. >> >> This analysis is not factoring into account the cycles it takes to >> map the pages before they are checked for duplicate bytes, > Do you mean the TLB misses? Keeping in mind that this primarily happens during the bulk-phase round, then yes, both TLB missing + the time it takes to trap into the kernel, map the page, and let the TLB re-walk the page table. But, as you pointed out, I do conceded that since most of the pages will already have been mapped after the bulk phase round, this should not be a problem anymore *after* that round has finished. Using the /proc//pagemap will probably go much further towards solving the problem than disabling zero page scanning. If its already possible to know if a page is not mapped, then there won't be any need to scan it in the first place. Once the page is mapped already, yes, I do see clearly that is_dup_page() performance would probably be minimal. Nevertheless, the initial "burst" of the bulk phase round is still important to optimize, and I would like to know if the maintainer would accept this API for disabling the scan or not. We think it's important because the total migration time can be much smaller with high-throughput RDMA links by optimizing the bulk-phase round and that lower total migration time is very valuable to many of our workloads, in addition to the low-downtime benefits you get with RDMA. > These are the real world scenarios that I was talking about. Do you > have profiles of these, with the latest QEMU code, that show > is_dup_page() to be expensive? I have expensive numbers only for the bulk phase round. Other than that, I would be breaking confidentiality outside of the paper we have already published.