From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:54502) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF5eC-0000zo-Pk for qemu-devel@nongnu.org; Mon, 11 Mar 2013 12:32:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UF5e8-0006z1-Ma for qemu-devel@nongnu.org; Mon, 11 Mar 2013 12:32:24 -0400 Received: from e7.ny.us.ibm.com ([32.97.182.137]:53210) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF5e8-0006yv-HV for qemu-devel@nongnu.org; Mon, 11 Mar 2013 12:32:20 -0400 Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 11 Mar 2013 12:32:19 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id F0FA56E804F for ; Mon, 11 Mar 2013 12:32:13 -0400 (EDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2BGWDVT143416 for ; Mon, 11 Mar 2013 12:32:14 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2BGVfEj008572 for ; Mon, 11 Mar 2013 10:31:43 -0600 Message-ID: <513E06E8.1080706@linux.vnet.ibm.com> Date: Mon, 11 Mar 2013 12:31:36 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <1362976414-21396-1-git-send-email-mrhines@us.ibm.com> <1362976414-21396-8-git-send-email-mrhines@us.ibm.com> <513DE341.80209@redhat.com> In-Reply-To: <513DE341.80209@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v3: 07/10] Send the actual pages over RDMA. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aliguori@us.ibm.com, mst@redhat.com, michael.r.hines.mrhines@linux.vnet.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com Acknowledged all... On 03/11/2013 09:59 AM, Paolo Bonzini wrote: > Il 11/03/2013 05:33, Michael.R.Hines.mrhines@linux.vnet.ibm.com ha scritto: >> From: "Michael R. Hines" >> >> For performance reasons, dup_page() and xbzrle() is skipped because >> they are too expensive for zero-copy RDMA. >> >> Signed-off-by: Michael R. Hines >> --- >> arch_init.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 56 insertions(+), 1 deletion(-) >> >> diff --git a/arch_init.c b/arch_init.c >> index 8daeafa..437cb47 100644 >> --- a/arch_init.c >> +++ b/arch_init.c >> @@ -45,6 +45,7 @@ >> #include "exec/address-spaces.h" >> #include "hw/pcspk.h" >> #include "migration/page_cache.h" >> +#include "migration/rdma.h" >> #include "qemu/config-file.h" >> #include "qmp-commands.h" >> #include "trace.h" >> @@ -245,6 +246,18 @@ uint64_t norm_mig_pages_transferred(void) >> return acct_info.norm_pages; >> } >> >> +/* >> + * RDMA does not use the buffered_file, >> + * but we still need a way to do accounting... >> + */ >> +uint64_t delta_norm_mig_bytes_transferred(void) >> +{ >> + static uint64_t last_norm_pages = 0; >> + uint64_t delta_bytes = (acct_info.norm_pages - last_norm_pages) * TARGET_PAGE_SIZE; >> + last_norm_pages = acct_info.norm_pages; >> + return delta_bytes; >> +} >> + >> uint64_t xbzrle_mig_bytes_transferred(void) >> { >> return acct_info.xbzrle_bytes; >> @@ -282,6 +295,45 @@ static size_t save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset, >> return size; >> } >> >> +static size_t save_rdma_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, >> + int cont) >> +{ >> + int ret; >> + size_t bytes_sent = 0; >> + ram_addr_t current_addr; >> + RDMAData * rdma = &migrate_get_current()->rdma; >> + >> + acct_info.norm_pages++; >> + >> + /* >> + * use RDMA to send page >> + */ > Not quite true, the page is added to the current chunk. Please make the > comments a quick-and-dirty reference of the protocol, or leave them out > altogether. > >> + current_addr = block->offset + offset; >> + if ((ret = qemu_rdma_write(rdma, current_addr, TARGET_PAGE_SIZE)) < 0) { >> + fprintf(stderr, "rdma migration: write error! %d\n", ret); >> + qemu_file_set_error(f, ret); >> + return ret; >> + } >> + >> + /* >> + * do some polling >> + */ > Again, that's quite self-evident. Poll for what though? :) > >> + while (1) { >> + int ret = qemu_rdma_poll(rdma); >> + if (ret == RDMA_WRID_NONE) { >> + break; >> + } >> + if (ret < 0) { >> + fprintf(stderr, "rdma migration: polling error! %d\n", ret); >> + qemu_file_set_error(f, ret); >> + return ret; >> + } >> + } >> + >> + bytes_sent += TARGET_PAGE_SIZE; >> + return bytes_sent; >> +} > As written in the other message, I think this should be an additional > QEMUFile operation, hopefully the same that Orit is introducing in her > patches. > >> #define ENCODING_FLAG_XBZRLE 0x1 >> >> static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data, >> @@ -462,7 +514,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage) >> >> /* In doubt sent page as normal */ >> bytes_sent = -1; >> - if (is_dup_page(p)) { >> + if (migrate_use_rdma()) { >> + /* searching for zeros is still too expensive for RDMA */ >> + bytes_sent = save_rdma_page(f, block, offset, cont); > Again as written in the other message, this is not really an RDMA thing, > it's mostly the effect of a fast link. Of course to some extent it > depends on the CPU and RAM speed, but we can fake that it isn't. > >> + } else if (is_dup_page(p)) { >> acct_info.dup_pages++; >> bytes_sent = save_block_hdr(f, block, offset, cont, >> RAM_SAVE_FLAG_COMPRESS); >> > Thanks, > > Paolo >