From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57839) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZD6se-0005yg-3K for qemu-devel@nongnu.org; Thu, 09 Jul 2015 04:08:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZD6sY-0005Ui-Im for qemu-devel@nongnu.org; Thu, 09 Jul 2015 04:08:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60986) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZD6sY-0005Ua-Af for qemu-devel@nongnu.org; Thu, 09 Jul 2015 04:08:22 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id B7510C12B6 for ; Thu, 9 Jul 2015 08:08:21 +0000 (UTC) Date: Thu, 9 Jul 2015 09:08:17 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150709080817.GB2359@work-vm> References: <1436365567-27164-1-git-send-email-dgilbert@redhat.com> <87wpyan2ce.fsf@neno.neno> <20150708165537.GA28708@work-vm> <87lheqmpuf.fsf@neno.neno> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lheqmpuf.fsf@neno.neno> Subject: Re: [Qemu-devel] [PATCH] RDMA: Reduce restriction on block length match List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: amit.shah@redhat.com, "Dr. David Alan Gilbert" , qemu-devel@nongnu.org * Juan Quintela (quintela@redhat.com) wrote: > "Dr. David Alan Gilbert" wrote: > > * Juan Quintela (quintela@redhat.com) wrote: > >> "Dr. David Alan Gilbert (git)" wrote: > >> > From: "Dr. David Alan Gilbert" > >> > > >> > My e4d633207 patch has an over zealous sanity check that checked > >> > the lengths of the RAM Blocks on source/destination were the same. This > >> > isn't true because of the 'used_length' trick for RAM blocks like the > >> > ACPI table that vary in size. > >> > > >> > Prior to that patch RDMA would also fail in this case, but it should > >> > now work with the changes in the set e4d633207 is in. > >> > > >> > Signed-off-by: Dr. David Alan Gilbert > >> > > >> > Fixes: e4d633207c129dc5b7d145240ac4a1997ef3902f > >> > --- > >> > migration/rdma.c | 13 +++++++------ > >> > 1 file changed, 7 insertions(+), 6 deletions(-) > >> > > >> > diff --git a/migration/rdma.c b/migration/rdma.c > >> > index f106b2a..1d094b0 100644 > >> > --- a/migration/rdma.c > >> > +++ b/migration/rdma.c > >> > @@ -3338,14 +3338,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, > >> > for (i = 0; i < nb_dest_blocks; i++) { > >> > network_to_dest_block(&rdma->dest_blocks[i]); > >> > > >> > - /* We require that the blocks are in the same order */ > >> > + /* We require that the blocks are in the same order, > >> > + * but the used_length trick for acpi blocks means that > >> > + * the destination can validly be larger than the source > >> > + */ > >> > if (rdma->dest_blocks[i].length != local->block[i].length) { > >> > >> Should we change the check to be that destination is bigger or equal > >> than source? > >> > >> With your change, we only remove the check? > > > > I'm actually going to drop this change; so keep the error if they're > > different. > > > > My argument works like this (I've not yet found a good way to test it): > > > > 1) The source sends to the destination a list of RAM blocks in the qemu-file stream > > 2) The destination performs a resize on the RAM blocks to match the source > > so at this point the destination's block sizes should match. > > Humm, I *thought* that what the resize does is getting it bigger, but if > destination is bigger, it does nothing, no? The code in migration/ram.c calls qemu_ram_resize for length != block->used_length and similalry qemu_ram_resize always seems to set block->used_length. I'm going to have to have more of a dig into this and figure out what's going on. Dave > > > > 3) The source sends a series of RDMA block registration requests for the RAM > > 4) The destination sends a list of RAM registrations back to the source > > 5) This check is checking that this destination list matches the local list > > 6) As long as (4) happens after (2) then the size that the destination sees > > should always match the source. > > 7) I think 4 is after 2 due to a qemu_fflush > > > > So keeping this check guards against 7 not really being true and/or > > the destination populating it's list of blocks prior to (2) - which I have > > a sneaky feeling might be happening, but am not sure yet. > > > > > > > Dave > > > >> > >> Thanks, Juan. > >> > >> > >> > - ERROR(errp, "Block %s/%d has a different length %" PRIu64 > >> > - "vs %" PRIu64, local->block[i].block_name, i, > >> > - local->block[i].length, > >> > + fprintf(stderr, "INFO: Block %s/%d has a different length %" > >> > + PRIu64 "vs %" PRIu64, local->block[i].block_name, > >> > + i, local->block[i].length, > >> > rdma->dest_blocks[i].length); > >> > - rdma->error_state = -EINVAL; > >> > - return -EINVAL; > >> > } > >> > local->block[i].remote_host_addr = > >> > rdma->dest_blocks[i].remote_host_addr; > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK