From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38955) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VCnGD-0003w6-1Y for qemu-devel@nongnu.org; Fri, 23 Aug 2013 05:02:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VCnG1-0006EQ-FC for qemu-devel@nongnu.org; Fri, 23 Aug 2013 05:02:25 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:52073) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VCnG0-0006Dm-UH for qemu-devel@nongnu.org; Fri, 23 Aug 2013 05:02:13 -0400 Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 23 Aug 2013 14:26:10 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp01.in.ibm.com (Postfix) with ESMTP id 21890E004F for ; Fri, 23 Aug 2013 14:32:36 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r7N937OX21889084 for ; Fri, 23 Aug 2013 14:33:08 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r7N91Xx2004645 for ; Fri, 23 Aug 2013 14:31:34 +0530 Message-ID: <521724A3.8050801@linux.vnet.ibm.com> Date: Fri, 23 Aug 2013 17:00:19 +0800 From: Lei Li MIME-Version: 1.0 References: <1377069536-12658-1-git-send-email-lilei@linux.vnet.ibm.com> <1377069536-12658-14-git-send-email-lilei@linux.vnet.ibm.com> <52149B09.705@redhat.com> <52170060.4050104@linux.vnet.ibm.com> <521713DA.9010903@redhat.com> In-Reply-To: <521713DA.9010903@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aarcange@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org, mrhines@linux.vnet.ibm.com, Anthony Liguori , lagarcia@br.ibm.com, rcj@linux.vnet.ibm.com On 08/23/2013 03:48 PM, Paolo Bonzini wrote: > Il 23/08/2013 08:25, Lei Li ha scritto: >> On 08/21/2013 06:48 PM, Paolo Bonzini wrote: >>> Il 21/08/2013 09:18, Lei Li ha scritto: >>>> Send all the ram blocks hooked by save_page, which will copy >>>> ram page and MADV_DONTNEED the page just copied. >>> You should implement this entirely in the hook. >>> >>> It will be a little less efficient because of the dirty bitmap overhead, >>> but you should aim at having *zero* changes in arch_init.c and >>> migration.c. >> Yes, the reason I modify the migration_thread() to have new process that >> send all the ram pages in adjusted qemu_savevm_state_begin stage and send device >> states in qemu_savevm_device_state stage for localhost migration is to avoid the >> bitmap thing, which is a little less efficient just like you mentioned above. >> >> The performance assurance is very important to this feature, our goal is >> 100ms of downtime for a 1TB guest. > Do not _start_ by introducing encapsulation violations all over the place. > > Juan has been working on optimizing the dirty bitmap code. His patches > could introduce a speedup of a factor of up to 64. Thus it is possible > that his work will help you enough that you can work with the dirty bitmap. > > Also, this feature (not looking at the dirty bitmap if the machine is > stopped) is not limited to localhost migration, add it later once the > basic vmsplice plumbing is in place. This will also let you profile the > code and understand whether the goal is attainable. > > I honestly doubt that 100ms of downtime is possible while the machine is > stopped. A 1TB guest has 2^28 = 268*10^6 pages, which you want to > process in 100*10^6 nanoseconds. Thus, your approach would require 0.4 > nanoseconds per page, or roughly 2 clock cycles per page. This is > impossible without _massive_ parallelization at all levels, starting > from the kernel. > > As a matter of fact, 2^28 madvise system calls will take much, much > longer than 100ms. > > Have you thought of using shared memory (with -mempath) instead of vmsplice? Precisely! Well, as Anthony mentioned in the version 1[1], there has been some work involved regarding improvement of vmsplice() at kernel side by Robert Jennings[2]. And yes, shared memory is an alternative, I think the problem with shared memory is that can't share anonymous memory. For this maybe Anthony can chime in as the original idea him. :-) Reference links: [1] Anthony's comments: https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html [2] vmpslice support for zero-copy gifting of pages: http://comments.gmane.org/gmane.linux.kernel.mm/103998 > > Paolo > -- Lei