From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40277) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrBcC-0006YK-Nn for qemu-devel@nongnu.org; Wed, 19 Nov 2014 15:12:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XrBc2-0000t3-77 for qemu-devel@nongnu.org; Wed, 19 Nov 2014 15:12:36 -0500 Received: from mail-ob0-x22a.google.com ([2607:f8b0:4003:c01::22a]:32805) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrBc2-0000rA-1H for qemu-devel@nongnu.org; Wed, 19 Nov 2014 15:12:26 -0500 Received: by mail-ob0-f170.google.com with SMTP id wp18so1101722obc.1 for ; Wed, 19 Nov 2014 12:12:25 -0800 (PST) Received: from Gary.local (67-198-50-36.static.grandenetworks.net. [67.198.50.36]) by mx.google.com with ESMTPSA id mq4sm96362obb.22.2014.11.19.12.12.24 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Nov 2014 12:12:24 -0800 (PST) Message-ID: <546CF9A8.6070104@gmail.com> Date: Wed, 19 Nov 2014 14:12:24 -0600 From: Gary R Hook MIME-Version: 1.0 References: <546CE8EC.9090908@gmail.com> In-Reply-To: <546CE8EC.9090908@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Ugh, I wish I could teach Thunderbird to understand how to reply to a newsgroup. Apologies to Paolo for the direct note. On 11/19/14 4:19 AM, Paolo Bonzini wrote: > > > On 19/11/2014 10:35, Dr. David Alan Gilbert wrote: >> * Paolo Bonzini (pbonzini@redhat.com) wrote: >>> >>> >>> On 18/11/2014 21:28, Dr. David Alan Gilbert wrote: >>>> This seems odd, since as far as I know the tunneling code is quite separate >>>> to the migration code; I thought the only thing that the migration >>>> code sees different is the file descriptors it gets past. >>>> (Having said that, again I don't know storage stuff, so if this >>>> is a storage special there may be something there...) >>> >>> Tunnelled migration uses the old block-migration.c code. Non-tunnelled >>> migration uses the NBD server and block/mirror.c. >> >> OK, that explains that. Is that because the tunneling code can't >> deal with tunneling the NBD server connection? >> >>> The main problem with >>> the old code is that uses a possibly unbounded amount of memory in >>> mig_save_device_dirty and can have huge jitter if any serious workload >>> is running in the guest. >> >> So that's sending dirty blocks iteratively? Not that I can see >> when the allocations get freed; but is the amount allocated there >> related to total disk size (as Gary suggested) or to the amount >> of dirty blocks? > > It should be related to the maximum rate limit (which can be set to > arbitrarily high values, however). This makes no sense. The code in block_save_iterate() specifically attempts to control the rate of transfer. But when qemu_file_get_rate_limit() returns a number like 922337203685372723 (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth constraints are being imposed at this layer. Why, then, would that transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE connection) with no clear bottleneck in CPU or network? What other relation might exist? > The reads are started, then the ones that are ready are sent and the > blocks are freed in flush_blks. The jitter happens when the guest reads > a lot but only writes a few blocks. In that case, the bdrv_drain_all in > mig_save_device_dirty can be called relatively often and it can be > expensive because it also waits for all guest-initiated reads to complete. Pardon my ignorance, but this does not match my observations. What I am seeing is the process size of the source qemu grow steadily until the COR completes; during this time the backing file on the destination system does not change/grow at all, which implies that no blocks are being transferred. (I have tested this with a 25GB VM disk, and larger; no network activity occurs during this period.) Once the COR is done and the in-memory copy ready (marked by a "Completed 100%" message from blk_mig_save_builked_block()) the transfer begins. At an abysmally slow rate, I'll add, per the above. Another problem to be investigated. > The bulk phase is similar, just with different functions (the reads are > done in mig_save_device_bulk). With a high rate limit, the total > allocated memory can reach a few gigabytes indeed. Much, much more than that. It's definitely dependent upon the disk file size. Tiny VM disks are a nit; big VM disks are a problem. > Depending on the scenario, a possible disadvantage of NBD migration is > that it can only throttle each disk separately, while the old code will > apply a single limit to all migrations. How about no throttling at all? And just to be very clear, the goal is fast (NBD-based) migrations of VMs using non-shared storage over an encrypted channel. Safest, worst-case scenario. Aside from gaining an understanding of this code. Thank you for your attention. -- Gary R Hook Senior Kernel Engineer NIMBOXX, Inc