From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60559) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asuKu-0003NF-9X for qemu-devel@nongnu.org; Wed, 20 Apr 2016 11:46:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1asuKp-0007FC-6b for qemu-devel@nongnu.org; Wed, 20 Apr 2016 11:46:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asuKo-0007F5-VL for qemu-devel@nongnu.org; Wed, 20 Apr 2016 11:46:35 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 710137F368 for ; Wed, 20 Apr 2016 15:46:34 +0000 (UTC) Date: Wed, 20 Apr 2016 18:46:31 +0300 From: "Michael S. Tsirkin" Message-ID: <20160420154631.GA28751@redhat.com> References: <1461163481-11439-1-git-send-email-quintela@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1461163481-11439-1-git-send-email-quintela@redhat.com> Subject: Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: qemu-devel@nongnu.org, amit.shah@redhat.com, dgilbert@redhat.com On Wed, Apr 20, 2016 at 04:44:28PM +0200, Juan Quintela wrote: > Hi > > This patch series is "an" initial implementation of multiple fd migration. > This is to get something out for others to comment, it is not finished at all. > > So far: > > - we create threads for each new fd > > - only for tcp of course, rest of transports are out of luck > I need to integrate this with daniel channel changes > > - I *think* the locking is right, at least I don't get more random > lookups (and yes, it was not trivial). And yes, I think that the > compression code locking is not completely correct. I think it > would be much, much better to do the compression code on top of this > (will avoid a lot of copies), but I need to finish this first. > > - Last patch, I add a BIG hack to try to know what the real bandwidth > is. > > > Preleminar testing so far: > > - quite good, the latency is much better, but was change so far, I > think I found the problem for the random high latencies, but more > testing is needed. > > - under load, I think our bandwidth calculations are *not* completely > correct (This is the way to spell it to be allowed for a family audience). > > > ToDo list: > - bandwidth calculation: I am going to send another mail > with my ToDo list for migration, see there. > > - stats: We need better stats, by thread, etc > > - sincronize less times with the worker threads. > right now we syncronize for each page, there are two obvious optimizations > * send a list of pages each time we wakeup an fd > * if we have to sent a HUGE page, dont' do a single split, just sent the whole page > in one send() and read things with a single recv() on destination. > My understanding is that this would make Transparent Huge pages trivial. > - measure things under bigger loads > > Comments, please? Nice to see this take shape. There's something that looks suspicious from quick look at the patches: - imagine that the same page gets transmitted on two sockets on first, then on second one - it's possible that the second update is received and handled on destination before the first one Note: you do make sure a single thread sends data for a page at a time, but that does not seem to affect the order in which it's received. In that case, I suspect the first one will overwrite the page with stale data. A simple fix would be to change static int multifd_send_page(uint8_t *address) to calculate the fd based on address. E.g. (long)address/PAGE_SIZE % thread_count. Or split memory between threads in some other way. HTH > Later, Juan. > > Juan Quintela (13): > migration: create Migration Incoming State at init time > migration: Pass TCP args in an struct > migration: [HACK] Don't create decompression threads if not enabled > migration: Add multifd capability > migration: Create x-multifd-threads parameter > migration: create multifd migration threads > migration: Start of multiple fd work > migration: create ram_multifd_page > migration: Create thread infrastructure for multifd send side > migration: Send the fd number which we are going to use for this page > migration: Create thread infrastructure for multifd recv side > migration: Test new fd infrastructure > migration: [HACK]Transfer pages over new channels > > hmp.c | 10 ++ > include/migration/migration.h | 13 ++ > migration/migration.c | 100 ++++++++---- > migration/ram.c | 350 +++++++++++++++++++++++++++++++++++++++++- > migration/savevm.c | 3 +- > migration/tcp.c | 76 ++++++++- > qapi-schema.json | 29 +++- > 7 files changed, 540 insertions(+), 41 deletions(-) > > -- > 2.5.5 >