From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40190) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1axzme-0006z2-2f for qemu-devel@nongnu.org; Wed, 04 May 2016 12:36:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1axzmR-0005Ou-PC for qemu-devel@nongnu.org; Wed, 04 May 2016 12:36:14 -0400 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:59851) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1axzmR-0005Fv-Dy for qemu-devel@nongnu.org; Wed, 04 May 2016 12:36:07 -0400 Received: from localhost by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 4 May 2016 17:35:29 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 1F8C217D805D for ; Wed, 4 May 2016 17:36:21 +0100 (BST) Received: from d06av05.portsmouth.uk.ibm.com (d06av05.portsmouth.uk.ibm.com [9.149.37.229]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u44GZQGm64815188 for ; Wed, 4 May 2016 16:35:26 GMT Received: from d06av05.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av05.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u44GZQX4030327 for ; Wed, 4 May 2016 10:35:26 -0600 Date: Wed, 4 May 2016 18:35:22 +0200 From: Greg Kurz Message-ID: <20160504183522.436ee284@bahia.huguette.org> In-Reply-To: <20160504124711.GG2302@work-vm> References: <87oa8mf4sj.fsf@emacs.mitica> <20160504124711.GG2302@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Juan Quintela , QEMU Developer On Wed, 4 May 2016 13:47:12 +0100 "Dr. David Alan Gilbert" wrote: > * Juan Quintela (quintela@redhat.com) wrote: > > > > Hi > > > > I am lots of times asked about what is the ToDo list for migration, that > > was on my head, and random notes over my desk, so, trying some > > organization (Yes, I would put this in the wiki). > > Let me add: > Getting everything to use VMState; I intend to try and fix virtio to use > VMState as much as possible. > I had tried to revive Juan's 41-patch series from 2009 some time ago but it was really tedious and the virtio 1.0 work started and I gave up... > And yes, a wiki entry would be good; then people might notice it and fix things > for us :-) > But I'm still willing to help if I can. :) > > - migration thread on reception > > would make trivial to do other things while receiving, and would make > > postcopy easier also (I was going to put much easier, but postcopy is > > never easy). > > I don't think it makes much difference to postcopy. > > > - migration capabilities and parameters > > this is a mess. Not, is worse than that. I don't know who is to > > blame here, but something needs to be done: > > > > void qmp_migrate_set_parameters(bool has_compress_level, > > int64_t compress_level, > > bool has_compress_threads, > > int64_t compress_threads, > > bool has_decompress_threads, > > int64_t decompress_threads, > > bool has_x_cpu_throttle_initial, > > int64_t x_cpu_throttle_initial, > > bool has_x_cpu_throttle_increment, > > int64_t x_cpu_throttle_increment, > > bool has_multifd_threads, > > int64_t multifd_threads, > > Error **errp) > > > > > > > > Can we move this to an array of structs, please, pretty please? > > I think that for this one, the blame is on qmp > > Yes; zhanghailiang had a patch to try and help that and there was > some discussion at about the same time (June last year?!) > That function is VERY delicate; if you screw up and get those in the > wrong order then everything will appear to be just fine.... > > > - info migrate > > This deserves its own item. Lets see a typical output > > > > (qemu)info migrate > > > > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on > > > > Aha, we have the capabilities, but not the parameters. This is > > historical, I know, but don't belong here. > > Well, for the HMP version we can fix any of this IMHO without a problem; > lets add more detail/fix names/etc. > > > And we still have more optional information that appears if we are doing > > block migration, xbzrle, compression, rdma, etc, etc. > > > > We need to decide some units also internal. Some things are in bytes, > > some are in kilobytes, some are in pages. Some are in host pages, or > > guest pages, or who knows :-( > > I don't - every time I look at some of it I end up going back to the source. > > > - Block migration (the migration/block.c one). This is the bastard > > child of migration. Much less tested, we should make a decision > > about letting it live or deprecating it. Things needed from memory: > > - functions should return the same values than ram.c > > some functions don't have "exact" values, and return 1 when there > > are more than one block dirty, etc, etc > > - if we continue maintaing it, allowing it to have _some_ shared > > devices and some non shared ones, insntead of everything? > > My vague understanding was that there were still configurations that were > only useable with block migration; mostly those things that only wanted > a single socket because they wanted to tunnel it; this might change with > Dan's TLS setup. > Having said that, I don't understand all of the block migration alternatives. > > > - RDMA: Another step child > > > > This is really, really weird. We don't use the normal infrastructure > > for RDMA, we use the ram_control_* stuff. We should really move to > > use the normal stuff here. > > I'm not sure that's possible - while the RDMA code is huge and horribly > complex, some of that is just down to the kernel APIs and standards it > has to deal with; it might be possibl to glue it into ram.c better > but I wouldn't bet on it. > > > - autoconverge code: This could be used outside of migration (i.e. just > > to slow down a guess). We should really do some measurement here to > > see how useful it is for migration. If the guest is using lots of > > memory dirtying, we end having to throttle the guest 90% or so :-( > > Dan's doing some I think. The other question is how it compares to using > an external cgroup based converge (which I think is what oVirt does). > > > - xbzrle. We only have one cache, we should decide how to work with > > this for multithread/compression. > > > > - When we do migration, we have spaguetti code to decide if: > > * it is a zero page > > * it is a duplicated page > > * it is a xbzrle page > > * it is a compressed page > > And as the code is written, it is not trivial to add new "options". I > > think that we should "re-think" what combinations are allowed an which > > ones make nosense. > > Yeh, and find a way to express to libvirt what combinations are legal. > > > - savevm and migration: they use two different paths for not really good > > reason. We should really abstract this to a single code path. > > We always forget the savevm one when we do changes. > > > > - error handling. Every function should return an error. Every > > function should return an error. > > Yeh. > > > - qemu_get_buffer() don't give one error if there is nothing to read, > > sniff. > > > > - Multipage support: Welcome to the XXI century. Now almost all > > architectures have HugePages. And other have different sized pages > > (in PPC is not strange that page size of host and guest differ). We > > have work to do here. For starters, sending Huge pages as one chunk > > will make TransparentHugePages happier. > > Yeh, Andrea has pushed me about this a bit; the only problem I have > here is with postcopy where getting a page request stuck behind a huge > page request would do nasty things to the latency - but your multifd might > fix that. > > > - Bitmaps. Related with previous one. We should really be better about > > walking them and about synchronising them between qemu/kernel. > > Oh yes, they're a nightmare on things with different page sizes; especially > when people worry that the source and destination might have different host > page sizes. > > > - COLO: We need to integrate it. > > > > I will continue the rant at some other point O:-) Just now I need to > > left for the bar. > > One that's related to that, is the big-lock around the last stage of migrate; > we really could do with being able to recover from a migrate that hangs during > the final stage due to a block-IO or network issue. > > > Thanks for your attention, Juan. > > > > PD. I just looked while I wrote this to the channel code from Daniel, a > > step on the right direction. > > Dave > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >