From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1axwDT-0003ok-15 for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1axwD9-00086K-7o for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44430) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1axwD9-00083V-1J for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:27 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AB7F0804E5 for ; Wed, 4 May 2016 12:47:15 +0000 (UTC) Date: Wed, 4 May 2016 13:47:12 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160504124711.GG2302@work-vm> References: <87oa8mf4sj.fsf@emacs.mitica> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87oa8mf4sj.fsf@emacs.mitica> Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: QEMU Developer * Juan Quintela (quintela@redhat.com) wrote: > > Hi > > I am lots of times asked about what is the ToDo list for migration, that > was on my head, and random notes over my desk, so, trying some > organization (Yes, I would put this in the wiki). Let me add: Getting everything to use VMState; I intend to try and fix virtio to use VMState as much as possible. And yes, a wiki entry would be good; then people might notice it and fix things for us :-) > - migration thread on reception > would make trivial to do other things while receiving, and would make > postcopy easier also (I was going to put much easier, but postcopy is > never easy). I don't think it makes much difference to postcopy. > - migration capabilities and parameters > this is a mess. Not, is worse than that. I don't know who is to > blame here, but something needs to be done: > > void qmp_migrate_set_parameters(bool has_compress_level, > int64_t compress_level, > bool has_compress_threads, > int64_t compress_threads, > bool has_decompress_threads, > int64_t decompress_threads, > bool has_x_cpu_throttle_initial, > int64_t x_cpu_throttle_initial, > bool has_x_cpu_throttle_increment, > int64_t x_cpu_throttle_increment, > bool has_multifd_threads, > int64_t multifd_threads, > Error **errp) > > > > Can we move this to an array of structs, please, pretty please? > I think that for this one, the blame is on qmp Yes; zhanghailiang had a patch to try and help that and there was some discussion at about the same time (June last year?!) That function is VERY delicate; if you screw up and get those in the wrong order then everything will appear to be just fine.... > - info migrate > This deserves its own item. Lets see a typical output > > (qemu)info migrate > > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on > > Aha, we have the capabilities, but not the parameters. This is > historical, I know, but don't belong here. Well, for the HMP version we can fix any of this IMHO without a problem; lets add more detail/fix names/etc. > And we still have more optional information that appears if we are doing > block migration, xbzrle, compression, rdma, etc, etc. > > We need to decide some units also internal. Some things are in bytes, > some are in kilobytes, some are in pages. Some are in host pages, or > guest pages, or who knows :-( I don't - every time I look at some of it I end up going back to the source. > - Block migration (the migration/block.c one). This is the bastard > child of migration. Much less tested, we should make a decision > about letting it live or deprecating it. Things needed from memory: > - functions should return the same values than ram.c > some functions don't have "exact" values, and return 1 when there > are more than one block dirty, etc, etc > - if we continue maintaing it, allowing it to have _some_ shared > devices and some non shared ones, insntead of everything? My vague understanding was that there were still configurations that were only useable with block migration; mostly those things that only wanted a single socket because they wanted to tunnel it; this might change with Dan's TLS setup. Having said that, I don't understand all of the block migration alternatives. > - RDMA: Another step child > > This is really, really weird. We don't use the normal infrastructure > for RDMA, we use the ram_control_* stuff. We should really move to > use the normal stuff here. I'm not sure that's possible - while the RDMA code is huge and horribly complex, some of that is just down to the kernel APIs and standards it has to deal with; it might be possibl to glue it into ram.c better but I wouldn't bet on it. > - autoconverge code: This could be used outside of migration (i.e. just > to slow down a guess). We should really do some measurement here to > see how useful it is for migration. If the guest is using lots of > memory dirtying, we end having to throttle the guest 90% or so :-( Dan's doing some I think. The other question is how it compares to using an external cgroup based converge (which I think is what oVirt does). > - xbzrle. We only have one cache, we should decide how to work with > this for multithread/compression. > > - When we do migration, we have spaguetti code to decide if: > * it is a zero page > * it is a duplicated page > * it is a xbzrle page > * it is a compressed page > And as the code is written, it is not trivial to add new "options". I > think that we should "re-think" what combinations are allowed an which > ones make nosense. Yeh, and find a way to express to libvirt what combinations are legal. > - savevm and migration: they use two different paths for not really good > reason. We should really abstract this to a single code path. > We always forget the savevm one when we do changes. > > - error handling. Every function should return an error. Every > function should return an error. Yeh. > - qemu_get_buffer() don't give one error if there is nothing to read, > sniff. > > - Multipage support: Welcome to the XXI century. Now almost all > architectures have HugePages. And other have different sized pages > (in PPC is not strange that page size of host and guest differ). We > have work to do here. For starters, sending Huge pages as one chunk > will make TransparentHugePages happier. Yeh, Andrea has pushed me about this a bit; the only problem I have here is with postcopy where getting a page request stuck behind a huge page request would do nasty things to the latency - but your multifd might fix that. > - Bitmaps. Related with previous one. We should really be better about > walking them and about synchronising them between qemu/kernel. Oh yes, they're a nightmare on things with different page sizes; especially when people worry that the source and destination might have different host page sizes. > - COLO: We need to integrate it. > > I will continue the rant at some other point O:-) Just now I need to > left for the bar. One that's related to that, is the big-lock around the last stage of migrate; we really could do with being able to recover from a migrate that hangs during the final stage due to a block-IO or network issue. > Thanks for your attention, Juan. > > PD. I just looked while I wrote this to the channel code from Daniel, a > step on the right direction. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK